tilelang.profiler.bench module#
The profiler and convert to torch utils
- tilelang.profiler.bench.do_bench(fn: Callable, warmup: float = 25, rep: float = 100, _n_warmup: int = 0, _n_repeat: int = 0, grad_to_none: Optional[List[torch.Tensor]] = None, quantiles: Optional[List[float]] = None, fast_flush: bool = True, return_mode: Literal['min', 'max', 'mean', 'median'] = 'mean') Union[float, List[float]] #
Benchmarks the runtime of a PyTorch function.
This function handles: - L2 cache flushing between runs for consistent timing - Automatic warmup and repeat count calculation - Optional gradient clearing for backward passes - Multiple measurement modes (mean, median, min, max)
- Parameters:
fn β Function to benchmark
warmup β Target warmup time in milliseconds
rep β Target number of repetitions
_n_warmup β Override for number of warmup iterations
_n_repeat β Override for number of timing iterations
grad_to_none β Tensors whose gradients should be cleared between runs
quantiles β Optional performance percentiles to compute
fast_flush β Whether to use faster L2 cache flushing
return_mode β How to aggregate timing results (βmeanβ, βmedianβ, βminβ, βmaxβ)
- Returns:
Aggregated runtime in milliseconds
- Return type:
float