tilelang.autotuner.tuner¶

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

Attributes¶

logger

Exceptions¶

TimeoutException

Common base class for all non-exit exceptions.

Classes¶

AutoTuner

Auto-tuner for tilelang programs.

Functions¶

`timeout_handler`(signum, frame)
`run_with_timeout`(func, timeout, args, *kwargs)
`get_available_cpu_count`()	Gets the number of CPU cores available to the current process.
`autotune`([func, warmup, rep, timeout, supply_type, ...])	Just-In-Time (JIT) compiler decorator for TileLang functions.

Module Contents¶

exception tilelang.autotuner.tuner.TimeoutException¶

Bases: Exception

Common base class for all non-exit exceptions.

tilelang.autotuner.tuner.timeout_handler(signum, frame)¶

tilelang.autotuner.tuner.run_with_timeout(func, timeout, *args, **kwargs)¶

tilelang.autotuner.tuner.logger¶

tilelang.autotuner.tuner.get_available_cpu_count()¶

Gets the number of CPU cores available to the current process.

Return type:: int

class tilelang.autotuner.tuner.AutoTuner(fn, configs)¶

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:

fn (Callable) – The function to be auto-tuned.
configs – List of configurations to try during auto-tuning.

compile_args¶

profile_args¶

cache_dir: pathlib.Path¶

fn¶

configs¶

ref_latency_cache = None¶

jit_input_tensors = None¶

ref_input_tensors = None¶

jit_compile = None¶

classmethod from_kernel(kernel, configs)¶

Create an AutoTuner instance from a kernel function.

Parameters:

kernel (Callable) – The kernel function to auto-tune.
configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

set_compile_args(out_idx=None, target='auto', execution_backend='cython', target_host=None, verbose=False, pass_configs=None)¶

Set compilation arguments for the auto-tuner.

Parameters:

out_idx (Union[List[int], int, None]) – List of output tensor indices.
target (Literal['auto', 'cuda', 'hip']) – Target platform.
execution_backend (Literal['dlpack', 'ctypes', 'cython']) – Execution backend to use for kernel execution.
target_host (Union[str, tvm.target.Target]) – Target host for cross-compilation.
verbose (bool) – Whether to enable verbose output.
pass_configs (Optional[Dict[str, Any]]) – Additional keyword arguments to pass to the Compiler PassContext.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_profile_args(warmup=25, rep=100, timeout=30, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)¶

Set profiling arguments for the auto-tuner.

Parameters:

supply_type (tilelang.TensorSupplyType) – Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog (Callable) – Reference program for validation.
supply_prog (Callable) – Supply program for input tensors.
rtol (float) – Relative tolerance for validation.
atol (float) – Absolute tolerance for validation.
max_mismatched_ratio (float) – Maximum allowed mismatch ratio.
skip_check (bool) – Whether to skip validation.
manual_check_prog (Callable) – Manual check program for validation.
cache_input_tensors (bool) – Whether to cache input tensors.
warmup (int) – Number of warmup iterations.
rep (int) – Number of repetitions for timing.
timeout (int) – Maximum time per configuration.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_kernel_parameters(parameters)¶

Parameters:: parameters (Tuple[str, Ellipsis])

generate_cache_key(parameters)¶

Generate a cache key for the auto-tuning process.

Parameters:: parameters (Dict[str, Any])
Return type:: Optional[tilelang.autotuner.param.AutotuneResult]

run(warmup=25, rep=100, timeout=30)¶

Run the auto-tuning process.

Parameters:

warmup (int) – Number of warmup iterations.
rep (int) – Number of repetitions for timing.
timeout (int) – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

__call__()¶

Make the AutoTuner callable, running the auto-tuning process.

Returns:: Results of the auto-tuning process.
Return type:: AutotuneResult

tilelang.autotuner.tuner.autotune(func=None, *, configs, warmup=25, rep=100, timeout=100, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)¶

Just-In-Time (JIT) compiler decorator for TileLang functions.

This decorator can be used without arguments (e.g., @tilelang.jit):

Applies JIT compilation with default settings.

Tips:

If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.

```python

if enable_autotune:
kernel = flashattn(batch, heads, seq_len, dim, is_causal)

else:

kernel = flashattn(
batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)

```

Parameters:

func_or_out_idx (Any, optional) – If using @tilelang.jit(…) to configure, this is the out_idx parameter. If using @tilelang.jit directly on a function, this argument is implicitly the function to be decorated (and out_idx will be None).
configs (Dict or Callable) – Configuration space to explore during auto-tuning.
warmup (int, optional) – Number of warmup iterations before timing.
rep (int, optional) – Number of repetitions for timing measurements.
timeout (int, optional)
target (Union[str, Target], optional) – Compilation target for TVM (e.g., “cuda”, “llvm”). Defaults to “auto”.
target_host (Union[str, Target], optional) – Target host for cross-compilation. Defaults to None.
execution_backend (Literal["dlpack", "ctypes", "cython"], optional) – Backend for kernel execution and argument passing. Defaults to “cython”.
verbose (bool, optional) – Enables verbose logging during compilation. Defaults to False.
pass_configs (Optional[Dict[str, Any]], optional) – Configurations for TVM’s pass context. Defaults to None.
debug_root_path (Optional[str], optional) – Directory to save compiled kernel source for debugging. Defaults to None.
func (Union[Callable[tilelang.jit.param._P, tilelang.jit.param._RProg], tvm.tir.PrimFunc, None])
supply_type (tilelang.TensorSupplyType)
ref_prog (Callable)
supply_prog (Callable)
rtol (float)
atol (float)
max_mismatched_ratio (float)
skip_check (bool)
manual_check_prog (Callable)
cache_input_tensors (bool)

Returns:

Either a JIT-compiled wrapper around the input function, or a configured decorator instance that can then be applied to a function.

Return type:

Callable