tilelang.profiler.bench¶

Profiler and benchmarking utilities for PyTorch functions.

Attributes¶

`IS_CUDA`
`device`
`Event`

Classes¶

suppress_stdout_stderr

Context manager to suppress stdout and stderr output.

Functions¶

do_bench(fn[, warmup, rep, _n_warmup, _n_repeat, ...])

Benchmark the runtime of a PyTorch function with L2 cache management.

Module Contents¶

class tilelang.profiler.bench.suppress_stdout_stderr¶

Context manager to suppress stdout and stderr output.

Source: https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py

__enter__()¶

__exit__(*_)¶

tilelang.profiler.bench.IS_CUDA¶

tilelang.profiler.bench.device = 'cuda:0'¶

tilelang.profiler.bench.Event¶

tilelang.profiler.bench.do_bench(fn, warmup=25, rep=100, _n_warmup=0, _n_repeat=0, quantiles=None, fast_flush=True, backend='event', return_mode='mean')¶

Benchmark the runtime of a PyTorch function with L2 cache management.

This function provides accurate GPU kernel timing by: - Clearing L2 cache between runs for consistent measurements - Auto-calculating warmup and repeat counts based on kernel runtime - Supporting multiple profiling backends (CUDA events or CUPTI) - Offering flexible result aggregation (mean/median/min/max/quantiles)

Parameters:

fn (Callable) – Function to benchmark
warmup (float) – Target warmup time in milliseconds (default: 25)
rep (float) – Target total benchmark time in milliseconds (default: 100)
_n_warmup (int) – Manual override for warmup iterations (default: 0 = auto)
_n_repeat (int) – Manual override for benchmark iterations (default: 0 = auto)
quantiles (list[float] | None) – Performance percentiles to compute (e.g., [0.5, 0.95])
fast_flush (bool) – Use faster L2 cache flush with int32 vs int8 (default: True)
backend (Literal['event', 'cupti']) – Profiler backend - “event” (CUDA events) or “cupti” (default: “event”)
return_mode (Literal['min', 'max', 'mean', 'median']) – Result aggregation method - “mean”, “median”, “min”, or “max”

Returns:

Runtime in milliseconds (float) or list of quantile values if quantiles specified

Return type:

float | list[float]