tilelang.profiler

The profiler and convert to torch utils

Submodules

Classes

Profiler

A profiler class for benchmarking and validating kernel implementations.

Package Contents

class tilelang.profiler.Profiler

A profiler class for benchmarking and validating kernel implementations.

params

List of kernel parameters defining the input/output specifications

result_idx

Indices indicating which parameters are output tensors

supply_type

Type of tensor supply to use (e.g., random, zeros, etc.)

adapter

Optional kernel adapter for interfacing with different backends

params: List[tilelang.engine.param.KernelParam]
result_idx: List[int]
supply_type: tilelang.utils.tensor.TensorSupplyType
adapter: tilelang.jit.adapter.BaseKernelAdapter | None = None
__post_init__()

Initialize tensor supply after dataclass initialization

with_default_adapter(adapter)
Parameters:

adapter (tilelang.jit.adapter.BaseKernelAdapter)

Return type:

Profiler

assert_allclose(reference_program, input_tensors=None, atol=0.01, rtol=0.01, max_mismatched_ratio=0.01)

Validates kernel output against a reference implementation.

Parameters:
  • reference_program (Callable) – Reference implementation to compare against

  • input_tensors (Optional[List[torch.Tensor]]) – Optional pre-generated input tensors

  • atol (float) – Absolute tolerance for comparison

  • rtol (float) – Relative tolerance for comparison

  • max_mismatched_ratio – Maximum allowed ratio of mismatched elements

manual_assert_close(reference_program, input_tensors=None, manual_check_prog=None)

Validates kernel output against a reference implementation.

Parameters:
  • reference_program (Callable) – Reference implementation to compare against

  • input_tensors (Optional[List[torch.Tensor]]) – Optional pre-generated input tensors

  • atol – Absolute tolerance for comparison

  • rtol – Relative tolerance for comparison

  • max_mismatched_ratio – Maximum allowed ratio of mismatched elements

  • manual_check_prog (Callable)

assert_consistent(repeat=10)

Checks for kernel consistency across multiple runs.

Parameters:

repeat – Number of times to repeat the consistency check

run_once(func=None)
Parameters:

func (Optional[Callable])

determine_profiler(func=None)

Determines which profiler backend to use based on function type.

Parameters:
  • func (Optional[Callable]) – Function to be profiled

  • profiler – Explicitly specified profiler type or “auto” for automatic detection

Returns:

The determined profiler type (“torch” or “tvm”)

Return type:

str

do_bench(func=None, warmup=25, rep=100, n_warmup=1, n_repeat=1, input_tensors=None)

Benchmarks the execution time of a given function.

Parameters:
  • func (Optional[Callable]) – Function to benchmark (uses adapter if None)

  • warmup (int) – Warmup time in milliseconds

  • rep (int) – Number of repetitions for timing

  • n_warmup (int) – Number of warmup iterations

  • n_repeat (int) – Number of timing iterations

  • profiler – Which profiling backend to use

  • input_tensors (List[torch.Tensor]) – Optional pre-generated input tensors

Returns:

Average execution time in milliseconds

Return type:

float

property func
__call__(*args, **kwds)
Parameters:
  • args (Any)

  • kwds (Any)

Return type:

Any