tilelang.autotuner.tuner

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

Attributes

Exceptions

TimeoutException

Common base class for all non-exit exceptions.

Classes

AutoTuner

Auto-tuner for tilelang programs.

AutoTuneImpl

Abstract base class for generic types.

Functions

timeout_handler(signum, frame)

run_with_timeout(func, timeout, *args, **kwargs)

get_available_cpu_count()

Gets the number of CPU cores available to the current process.

autotune([func, warmup, rep, timeout, supply_type, ...])

Just-In-Time (JIT) compiler decorator for TileLang functions.

Module Contents

exception tilelang.autotuner.tuner.TimeoutException

Bases: Exception

Common base class for all non-exit exceptions.

tilelang.autotuner.tuner.timeout_handler(signum, frame)
tilelang.autotuner.tuner.run_with_timeout(func, timeout, *args, **kwargs)
tilelang.autotuner.tuner.logger
tilelang.autotuner.tuner.get_available_cpu_count()

Gets the number of CPU cores available to the current process.

Return type:

int

class tilelang.autotuner.tuner.AutoTuner(fn, configs)

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:
  • fn (Callable) – The function to be auto-tuned.

  • configs – List of configurations to try during auto-tuning.

compile_args
profile_args
cache_dir: pathlib.Path
fn
configs
ref_latency_cache = None
jit_input_tensors = None
ref_input_tensors = None
jit_compile = None
classmethod from_kernel(kernel, configs)

Create an AutoTuner instance from a kernel function.

Parameters:
  • kernel (Callable) – The kernel function to auto-tune.

  • configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

set_compile_args(out_idx=None, target='auto', execution_backend='auto', target_host=None, verbose=False, pass_configs=None)

Set compilation arguments for the auto-tuner.

Parameters:
  • out_idx (list[int] | int | None) – List of output tensor indices.

  • target (Literal['auto', 'cuda', 'hip', 'metal']) – Target platform.

  • execution_backend (Literal['auto', 'tvm_ffi', 'ctypes', 'cython', 'nvrtc', 'torch']) – Execution backend to use for kernel execution.

  • target_host (str | tvm.target.Target) – Target host for cross-compilation.

  • verbose (bool) – Whether to enable verbose output.

  • pass_configs (dict[str, Any] | None) – Additional keyword arguments to pass to the Compiler PassContext.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_profile_args(warmup=25, rep=100, timeout=30, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)

Set profiling arguments for the auto-tuner.

Parameters:
  • supply_type (tilelang.TensorSupplyType) – Type of tensor supply mechanism. Ignored if supply_prog is provided.

  • ref_prog (Callable) – Reference program for validation.

  • supply_prog (Callable) – Supply program for input tensors.

  • rtol (float) – Relative tolerance for validation.

  • atol (float) – Absolute tolerance for validation.

  • max_mismatched_ratio (float) – Maximum allowed mismatch ratio.

  • skip_check (bool) – Whether to skip validation.

  • manual_check_prog (Callable) – Manual check program for validation.

  • cache_input_tensors (bool) – Whether to cache input tensors.

  • warmup (int) – Number of warmup iterations.

  • rep (int) – Number of repetitions for timing.

  • timeout (int) – Maximum time per configuration.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_kernel_parameters(k_parameters, f_parameters)
Parameters:
  • k_parameters (tuple[str, Ellipsis])

  • f_parameters (dict[str, Any])

generate_cache_key(parameters, extra_parameters)

Generate a cache key for the auto-tuning process.

Parameters:
  • parameters (dict[str, Any])

  • extra_parameters (dict[str, Any])

Return type:

tilelang.autotuner.param.AutotuneResult | None

run(warmup=25, rep=100, timeout=30)

Run the auto-tuning process.

Parameters:
  • warmup (int) – Number of warmup iterations.

  • rep (int) – Number of repetitions for timing.

  • timeout (int) – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

__call__()

Make the AutoTuner callable, running the auto-tuning process.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

class tilelang.autotuner.tuner.AutoTuneImpl

Bases: Generic[_P, _T]

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default
jit_impl: tilelang.jit.JITImpl
warmup: int = 25
rep: int = 100
timeout: int = 100
configs: dict | Callable = None
supply_type: tilelang.TensorSupplyType
ref_prog: Callable = None
supply_prog: Callable = None
rtol: float = 0.01
atol: float = 0.01
max_mismatched_ratio: float = 0.01
skip_check: bool = False
manual_check_prog: Callable = None
cache_input_tensors: bool = False
__post_init__()
get_tunner()
__call__(*args, **kwargs)
Parameters:
  • args (_P)

  • kwargs (_P)

Return type:

tilelang.jit.kernel.JITKernel

tilelang.autotuner.tuner.autotune(func=None, *, configs, warmup=25, rep=100, timeout=100, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)

Just-In-Time (JIT) compiler decorator for TileLang functions.

This decorator can be used without arguments (e.g., @tilelang.jit):

Applies JIT compilation with default settings.

Tips:
  • If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.
    ```python
    if enable_autotune:

    kernel = flashattn(batch, heads, seq_len, dim, is_causal)

    else:
    kernel = flashattn(

    batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)

    ```

Parameters:
  • func_or_out_idx (Any, optional) – If using @tilelang.jit(…) to configure, this is the out_idx parameter. If using @tilelang.jit directly on a function, this argument is implicitly the function to be decorated (and out_idx will be None).

  • configs (Dict or Callable) – Configuration space to explore during auto-tuning.

  • warmup (int, optional) – Number of warmup iterations before timing.

  • rep (int, optional) – Number of repetitions for timing measurements.

  • timeout (int, optional)

  • target (Union[str, Target], optional) – Compilation target for TVM (e.g., “cuda”, “llvm”). Defaults to “auto”.

  • target_host (Union[str, Target], optional) – Target host for cross-compilation. Defaults to None.

  • execution_backend (Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "torch"], optional) – Backend for kernel execution and argument passing. Use “auto” to pick a sensible default per target (cuda->tvm_ffi, metal->torch, others->cython).

  • verbose (bool, optional) – Enables verbose logging during compilation. Defaults to False.

  • pass_configs (Optional[Dict[str, Any]], optional) – Configurations for TVM’s pass context. Defaults to None.

  • debug_root_path (Optional[str], optional) – Directory to save compiled kernel source for debugging. Defaults to None.

  • func (Callable[_P, _T] | tvm.tir.PrimFunc | None)

  • supply_type (tilelang.TensorSupplyType)

  • ref_prog (Callable)

  • supply_prog (Callable)

  • rtol (float)

  • atol (float)

  • max_mismatched_ratio (float)

  • skip_check (bool)

  • manual_check_prog (Callable)

  • cache_input_tensors (bool)

Returns:

Either a JIT-compiled wrapper around the input function, or a configured decorator instance that can then be applied to a function.

Return type:

Callable