tilelang.autotuner.tuner

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

Attributes

Exceptions

TimeoutException

Common base class for all non-exit exceptions.

Classes

AutoTuner

Auto-tuner for tilelang programs.

Functions

timeout_handler(signum, frame)

run_with_timeout(func, timeout, *args, **kwargs)

get_available_cpu_count()

Gets the number of CPU cores available to the current process.

autotune([func, warmup, rep, timeout, supply_type, ...])

Just-In-Time (JIT) compiler decorator for TileLang functions.

Module Contents

exception tilelang.autotuner.tuner.TimeoutException

Bases: Exception

Common base class for all non-exit exceptions.

tilelang.autotuner.tuner.timeout_handler(signum, frame)
tilelang.autotuner.tuner.run_with_timeout(func, timeout, *args, **kwargs)
tilelang.autotuner.tuner.logger
tilelang.autotuner.tuner.get_available_cpu_count()

Gets the number of CPU cores available to the current process.

Return type:

int

class tilelang.autotuner.tuner.AutoTuner(fn, configs)

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:
  • fn (Callable) – The function to be auto-tuned.

  • configs – List of configurations to try during auto-tuning.

compile_args
profile_args
cache_dir: pathlib.Path
fn
configs
ref_latency_cache = None
jit_input_tensors = None
ref_input_tensors = None
jit_compile = None
classmethod from_kernel(kernel, configs)

Create an AutoTuner instance from a kernel function.

Parameters:
  • kernel (Callable) – The kernel function to auto-tune.

  • configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

set_compile_args(out_idx=None, target='auto', execution_backend='cython', target_host=None, verbose=False, pass_configs=None)

Set compilation arguments for the auto-tuner.

Parameters:
  • out_idx (Union[List[int], int, None]) – List of output tensor indices.

  • target (Literal['auto', 'cuda', 'hip']) – Target platform.

  • execution_backend (Literal['dlpack', 'ctypes', 'cython']) – Execution backend to use for kernel execution.

  • target_host (Union[str, tvm.target.Target]) – Target host for cross-compilation.

  • verbose (bool) – Whether to enable verbose output.

  • pass_configs (Optional[Dict[str, Any]]) – Additional keyword arguments to pass to the Compiler PassContext.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_profile_args(warmup=25, rep=100, timeout=30, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)

Set profiling arguments for the auto-tuner.

Parameters:
  • supply_type (tilelang.TensorSupplyType) – Type of tensor supply mechanism. Ignored if supply_prog is provided.

  • ref_prog (Callable) – Reference program for validation.

  • supply_prog (Callable) – Supply program for input tensors.

  • rtol (float) – Relative tolerance for validation.

  • atol (float) – Absolute tolerance for validation.

  • max_mismatched_ratio (float) – Maximum allowed mismatch ratio.

  • skip_check (bool) – Whether to skip validation.

  • manual_check_prog (Callable) – Manual check program for validation.

  • cache_input_tensors (bool) – Whether to cache input tensors.

  • warmup (int) – Number of warmup iterations.

  • rep (int) – Number of repetitions for timing.

  • timeout (int) – Maximum time per configuration.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_kernel_parameters(parameters)
Parameters:

parameters (Tuple[str, Ellipsis])

generate_cache_key(parameters)

Generate a cache key for the auto-tuning process.

Parameters:

parameters (Dict[str, Any])

Return type:

Optional[tilelang.autotuner.param.AutotuneResult]

run(warmup=25, rep=100, timeout=30)

Run the auto-tuning process.

Parameters:
  • warmup (int) – Number of warmup iterations.

  • rep (int) – Number of repetitions for timing.

  • timeout (int) – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

__call__()

Make the AutoTuner callable, running the auto-tuning process.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

tilelang.autotuner.tuner.autotune(func=None, *, configs, warmup=25, rep=100, timeout=100, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)

Just-In-Time (JIT) compiler decorator for TileLang functions.

This decorator can be used without arguments (e.g., @tilelang.jit):

Applies JIT compilation with default settings.

Tips:
  • If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.
    ```python
    if enable_autotune:

    kernel = flashattn(batch, heads, seq_len, dim, is_causal)

    else:
    kernel = flashattn(

    batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)

    ```

Parameters:
  • func_or_out_idx (Any, optional) – If using @tilelang.jit(…) to configure, this is the out_idx parameter. If using @tilelang.jit directly on a function, this argument is implicitly the function to be decorated (and out_idx will be None).

  • configs (Dict or Callable) – Configuration space to explore during auto-tuning.

  • warmup (int, optional) – Number of warmup iterations before timing.

  • rep (int, optional) – Number of repetitions for timing measurements.

  • timeout (int, optional)

  • target (Union[str, Target], optional) – Compilation target for TVM (e.g., “cuda”, “llvm”). Defaults to “auto”.

  • target_host (Union[str, Target], optional) – Target host for cross-compilation. Defaults to None.

  • execution_backend (Literal["dlpack", "ctypes", "cython"], optional) – Backend for kernel execution and argument passing. Defaults to “cython”.

  • verbose (bool, optional) – Enables verbose logging during compilation. Defaults to False.

  • pass_configs (Optional[Dict[str, Any]], optional) – Configurations for TVM’s pass context. Defaults to None.

  • debug_root_path (Optional[str], optional) – Directory to save compiled kernel source for debugging. Defaults to None.

  • func (Union[Callable[tilelang.jit.param._P, tilelang.jit.param._RProg], tvm.tir.PrimFunc, None])

  • supply_type (tilelang.TensorSupplyType)

  • ref_prog (Callable)

  • supply_prog (Callable)

  • rtol (float)

  • atol (float)

  • max_mismatched_ratio (float)

  • skip_check (bool)

  • manual_check_prog (Callable)

  • cache_input_tensors (bool)

Returns:

Either a JIT-compiled wrapper around the input function, or a configured decorator instance that can then be applied to a function.

Return type:

Callable