tilelang.autotuner package#

Submodules#

tilelang.autotuner.param module

Module contents#

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

class tilelang.autotuner.AutoTuner(fn: Callable, configs)#

Bases: object

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:

fn – The function to be auto-tuned.
configs – List of configurations to try during auto-tuning.

cache_dir: Path = PosixPath('/home/t-leiwang/.tilelang/cache/autotuner')#

compile_args = CompileArgs(out_idx=None, execution_backend='cython', target='auto', target_host=None, verbose=False, pass_configs=None)#

classmethod from_kernel(kernel: Callable, configs)#

Create an AutoTuner instance from a kernel function.

Parameters:

kernel – The kernel function to auto-tune.
configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

generate_cache_key(parameters: Dict[str, Any]) → Optional[AutotuneResult]#: Generate a cache key for the auto-tuning process.

profile_args = ProfileArgs(warmup=25, rep=100, timeout=30, supply_type=<TensorSupplyType.Auto: 7>, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=True)#

run(warmup: int = 25, rep: int = 100, timeout: int = 30)#

Run the auto-tuning process.

Parameters:

warmup – Number of warmup iterations.
rep – Number of repetitions for timing.
timeout – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

set_compile_args(out_idx: Optional[Union[List[int], int]] = None, target: Literal['auto', 'cuda', 'hip'] = 'auto', execution_backend: Literal['dlpack', 'ctypes', 'cython'] = 'cython', target_host: Optional[Union[str, Target]] = None, verbose: bool = False, pass_configs: Optional[Dict[str, Any]] = None)#

Set compilation arguments for the auto-tuner.

Parameters:

out_idx – List of output tensor indices.
target – Target platform.
execution_backend – Execution backend to use for kernel execution.
target_host – Target host for cross-compilation.
verbose – Whether to enable verbose output.
pass_configs – Additional keyword arguments to pass to the Compiler PassContext.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_kernel_parameters(parameters: Tuple[str, ...])#

set_profile_args(warmup: int = 25, rep: int = 100, timeout: int = 30, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, manual_check_prog: Optional[Callable] = None, cache_input_tensors: bool = False)#

Set profiling arguments for the auto-tuner.

Parameters:

supply_type – Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog – Reference program for validation.
supply_prog – Supply program for input tensors.
rtol – Relative tolerance for validation.
atol – Absolute tolerance for validation.
max_mismatched_ratio – Maximum allowed mismatch ratio.
skip_check – Whether to skip validation.
manual_check_prog – Manual check program for validation.
cache_input_tensors – Whether to cache input tensors.
warmup – Number of warmup iterations.
rep – Number of repetitions for timing.
timeout – Maximum time per configuration.

Returns:

Self for method chaining.

Return type:

AutoTuner

exception tilelang.autotuner.TimeoutException#: Bases: Exception

tilelang.autotuner.autotune(func: Optional[Union[Callable[[_P], _RProg], PrimFunc]] = None, *, configs: Union[Dict, Callable], warmup: int = 25, rep: int = 100, timeout: int = 100, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, manual_check_prog: Optional[Callable] = None, cache_input_tensors: bool = False)#

Just-In-Time (JIT) compiler decorator for TileLang functions.

This decorator can be used without arguments (e.g., @tilelang.jit):

Applies JIT compilation with default settings.

Tips:

If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.

```python

if enable_autotune:
kernel = flashattn(batch, heads, seq_len, dim, is_causal)

else:

kernel = flashattn(
batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)

```

Parameters:

func_or_out_idx (Any, optional) – If using @tilelang.jit(…) to configure, this is the out_idx parameter. If using @tilelang.jit directly on a function, this argument is implicitly the function to be decorated (and out_idx will be None).
configs (Dict or Callable) – Configuration space to explore during auto-tuning.
warmup (int, optional) – Number of warmup iterations before timing.
rep (int, optional) – Number of repetitions for timing measurements.
timeout (int, optional) –
target (Union[str, Target], optional) – Compilation target for TVM (e.g., “cuda”, “llvm”). Defaults to “auto”.
target_host (Union[str, Target], optional) – Target host for cross-compilation. Defaults to None.
execution_backend (Literal["dlpack", "ctypes", "cython"], optional) – Backend for kernel execution and argument passing. Defaults to “cython”.
verbose (bool, optional) – Enables verbose logging during compilation. Defaults to False.
pass_configs (Optional[Dict[str, Any]], optional) – Configurations for TVM’s pass context. Defaults to None.
debug_root_path (Optional[str], optional) – Directory to save compiled kernel source for debugging. Defaults to None.

Returns:

Either a JIT-compiled wrapper around the input function, or a configured decorator instance that can then be applied to a function.

Return type:

Callable

tilelang.autotuner.get_available_cpu_count() → int#: Gets the number of CPU cores available to the current process.

tilelang.autotuner.run_with_timeout(func, timeout, *args, **kwargs)#

tilelang.autotuner.timeout_handler(signum, frame)#