tilelang.autotuner package#
Module contents#
The auto-tune module for tilelang programs.
This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.
- class tilelang.autotuner.AutoTuner(fn: Callable, configs)#
Bases:
object
Auto-tuner for tilelang programs.
This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.
- Parameters:
fn β The function to be auto-tuned.
configs β List of configurations to try during auto-tuning.
- classmethod from_kernel(kernel: Callable, configs)#
Create an AutoTuner instance from a kernel function.
- Parameters:
kernel β The kernel function to auto-tune.
configs β List of configurations to try.
- Returns:
A new AutoTuner instance.
- Return type:
- run(warmup: int = 25, rep: int = 100, timeout: int = 30)#
Run the auto-tuning process.
- Parameters:
warmup β Number of warmup iterations.
rep β Number of repetitions for timing.
timeout β Maximum time per configuration.
- Returns:
Results of the auto-tuning process.
- Return type:
- set_compile_args(out_idx: Union[List[int], int] = -1, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto')#
Set compilation arguments for the auto-tuner.
- Parameters:
out_idx β List of output tensor indices.
supply_type β Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog β Reference program for validation.
supply_prog β Supply program for input tensors.
rtol β Relative tolerance for validation.
atol β Absolute tolerance for validation.
max_mismatched_ratio β Maximum allowed mismatch ratio.
skip_check β Whether to skip validation.
cache_input_tensors β Whether to cache input tensors.
target β Target platform.
- Returns:
Self for method chaining.
- Return type:
- class tilelang.autotuner.AutotuneResult(latency: float, config: dict, ref_latency: float, libcode: str, func: Callable, kernel: Callable)#
Bases:
object
Results from auto-tuning process.
- latency#
Best achieved execution latency.
- Type:
float
- config#
Configuration that produced the best result.
- Type:
dict
- ref_latency#
Reference implementation latency.
- Type:
float
- libcode#
Generated library code.
- Type:
str
- func#
Optimized function.
- Type:
Callable
- kernel#
Compiled kernel function.
- Type:
Callable
- config: dict#
- func: Callable#
- kernel: Callable#
- latency: float#
- libcode: str#
- ref_latency: float#
- class tilelang.autotuner.CompileArgs(out_idx: Union[List[int], int] = -1, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto')#
Bases:
object
Compile arguments for the auto-tuner.
- out_idx#
List of output tensor indices.
- Type:
Union[List[int], int]
- supply_type#
Type of tensor supply mechanism.
- ref_prog#
Reference program for correctness validation.
- Type:
Callable
- supply_prog#
Supply program for input tensors.
- Type:
Callable
- out_idx#
Union[List[int], int] = -1
- Type:
Union[List[int], int]
- supply_type#
tilelang.TensorSupplyType = tilelang.TensorSupplyType.Auto
- ref_prog#
Callable = None
- Type:
Callable
- supply_prog#
Callable = None
- Type:
Callable
- rtol#
float = 1e-2
- Type:
float
- atol#
float = 1e-2
- Type:
float
- max_mismatched_ratio#
float = 0.01
- Type:
float
- skip_check#
bool = False
- Type:
bool
- cache_input_tensors#
bool = True
- Type:
bool
- target#
Literal[βautoβ, βcudaβ, βhipβ] = βautoβ
- Type:
Literal[βautoβ, βcudaβ, βhipβ]
- atol: float = 0.01#
- cache_input_tensors: bool = True#
- max_mismatched_ratio: float = 0.01#
- out_idx: Union[List[int], int] = -1#
- ref_prog: Callable = None#
- rtol: float = 0.01#
- skip_check: bool = False#
- supply_prog: Callable = None#
- supply_type: TensorSupplyType = 7#
- target: Literal['auto', 'cuda', 'hip'] = 'auto'#
- class tilelang.autotuner.JITContext(out_idx: List[int], ref_prog: Callable, supply_prog: Callable, rtol: float, atol: float, max_mismatched_ratio: float, skip_check: bool, cache_input_tensors: bool, kernel: JITKernel, supply_type: TensorSupplyType, target: Literal['cuda', 'hip'])#
Bases:
object
Context object for Just-In-Time compilation settings.
- out_idx#
List of output tensor indices.
- Type:
List[int]
- ref_prog#
Reference program for correctness validation.
- Type:
Callable
- supply_prog#
Supply program for input tensors.
- Type:
Callable
- rtol#
Relative tolerance for output validation.
- Type:
float
- atol#
Absolute tolerance for output validation.
- Type:
float
- max_mismatched_ratio#
Maximum allowed ratio of mismatched elements.
- Type:
float
- skip_check#
Whether to skip validation checks.
- Type:
bool
- cache_input_tensors#
Whether to cache input tensors for each compilation.
- Type:
bool
- kernel#
JITKernel instance for performance measurement.
- supply_type#
Type of tensor supply mechanism.
- target#
Target platform (βcudaβ or βhipβ).
- Type:
Literal[βcudaβ, βhipβ]
- atol: float#
- cache_input_tensors: bool#
- max_mismatched_ratio: float#
- out_idx: List[int]#
- ref_prog: Callable#
- rtol: float#
- skip_check: bool#
- supply_prog: Callable#
- supply_type: TensorSupplyType#
- target: Literal['cuda', 'hip']#
- tilelang.autotuner.autotune(configs: Any, warmup: int = 25, rep: int = 100, timeout: int = 100) AutotuneResult #
Decorator for auto-tuning tilelang programs.
- Parameters:
configs β Configuration space to explore during auto-tuning.
warmup β Number of warmup iterations before timing.
rep β Number of repetitions for timing measurements.
timeout β Maximum time (in seconds) allowed for each configuration.
- Returns:
Decorated function that performs auto-tuning.
- Return type:
Callable
- tilelang.autotuner.check_tensor_list_compatibility(list1: List[torch.Tensor], list2: List[torch.Tensor]) bool #
Checks if two lists of tensors are compatible.
Compatibility checks performed include: 1. Lists have the same length. 2. Corresponding tensors have the same shape.
- Parameters:
list1 β First list of tensors.
list2 β Second list of tensors.
- tilelang.autotuner.get_available_cpu_count()#
Gets the number of CPU cores available to the current process.
- tilelang.autotuner.jit(out_idx: Optional[List[int]] = None, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto') Callable #
Just-In-Time compilation decorator for tilelang programs.
- Parameters:
out_idx β List of output tensor indices.
supply_type β Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog β Reference program for correctness validation.
supply_prog β Supply program for input tensors.
rtol β Relative tolerance for output validation.
atol β Absolute tolerance for output validation.
max_mismatched_ratio β Maximum allowed ratio of mismatched elements.
skip_check β Whether to skip validation checks.
cache_input_tensors β Whether to cache input tensors for each compilation.
target β Target platform (βautoβ, βcudaβ, or βhipβ).
- Returns:
Decorated function that performs JIT compilation.
- Return type:
Callable