tilelang.autotuner package#

Module contents#

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

class tilelang.autotuner.AutoTuner(fn: Callable, configs)#

Bases: object

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:
  • fn – The function to be auto-tuned.

  • configs – List of configurations to try during auto-tuning.

classmethod from_kernel(kernel: Callable, configs)#

Create an AutoTuner instance from a kernel function.

Parameters:
  • kernel – The kernel function to auto-tune.

  • configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

run(warmup: int = 25, rep: int = 100, timeout: int = 30)#

Run the auto-tuning process.

Parameters:
  • warmup – Number of warmup iterations.

  • rep – Number of repetitions for timing.

  • timeout – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

set_compile_args(out_idx: Union[List[int], int] = -1, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto')#

Set compilation arguments for the auto-tuner.

Parameters:
  • out_idx – List of output tensor indices.

  • supply_type – Type of tensor supply mechanism. Ignored if supply_prog is provided.

  • ref_prog – Reference program for validation.

  • supply_prog – Supply program for input tensors.

  • rtol – Relative tolerance for validation.

  • atol – Absolute tolerance for validation.

  • max_mismatched_ratio – Maximum allowed mismatch ratio.

  • skip_check – Whether to skip validation.

  • cache_input_tensors – Whether to cache input tensors.

  • target – Target platform.

Returns:

Self for method chaining.

Return type:

AutoTuner

class tilelang.autotuner.AutotuneResult(latency: float, config: dict, ref_latency: float, libcode: str, func: Callable, kernel: Callable)#

Bases: object

Results from auto-tuning process.

latency#

Best achieved execution latency.

Type:

float

config#

Configuration that produced the best result.

Type:

dict

ref_latency#

Reference implementation latency.

Type:

float

libcode#

Generated library code.

Type:

str

func#

Optimized function.

Type:

Callable

kernel#

Compiled kernel function.

Type:

Callable

config: dict#
func: Callable#
kernel: Callable#
latency: float#
libcode: str#
ref_latency: float#
class tilelang.autotuner.CompileArgs(out_idx: Union[List[int], int] = -1, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto')#

Bases: object

Compile arguments for the auto-tuner.

out_idx#

List of output tensor indices.

Type:

Union[List[int], int]

supply_type#

Type of tensor supply mechanism.

Type:

tilelang.utils.tensor.TensorSupplyType

ref_prog#

Reference program for correctness validation.

Type:

Callable

supply_prog#

Supply program for input tensors.

Type:

Callable

out_idx#

Union[List[int], int] = -1

Type:

Union[List[int], int]

supply_type#

tilelang.TensorSupplyType = tilelang.TensorSupplyType.Auto

Type:

tilelang.utils.tensor.TensorSupplyType

ref_prog#

Callable = None

Type:

Callable

supply_prog#

Callable = None

Type:

Callable

rtol#

float = 1e-2

Type:

float

atol#

float = 1e-2

Type:

float

max_mismatched_ratio#

float = 0.01

Type:

float

skip_check#

bool = False

Type:

bool

cache_input_tensors#

bool = True

Type:

bool

target#

Literal[β€˜auto’, β€˜cuda’, β€˜hip’] = β€˜auto’

Type:

Literal[β€˜auto’, β€˜cuda’, β€˜hip’]

atol: float = 0.01#
cache_input_tensors: bool = True#
max_mismatched_ratio: float = 0.01#
out_idx: Union[List[int], int] = -1#
ref_prog: Callable = None#
rtol: float = 0.01#
skip_check: bool = False#
supply_prog: Callable = None#
supply_type: TensorSupplyType = 7#
target: Literal['auto', 'cuda', 'hip'] = 'auto'#
class tilelang.autotuner.JITContext(out_idx: List[int], ref_prog: Callable, supply_prog: Callable, rtol: float, atol: float, max_mismatched_ratio: float, skip_check: bool, cache_input_tensors: bool, kernel: JITKernel, supply_type: TensorSupplyType, target: Literal['cuda', 'hip'])#

Bases: object

Context object for Just-In-Time compilation settings.

out_idx#

List of output tensor indices.

Type:

List[int]

ref_prog#

Reference program for correctness validation.

Type:

Callable

supply_prog#

Supply program for input tensors.

Type:

Callable

rtol#

Relative tolerance for output validation.

Type:

float

atol#

Absolute tolerance for output validation.

Type:

float

max_mismatched_ratio#

Maximum allowed ratio of mismatched elements.

Type:

float

skip_check#

Whether to skip validation checks.

Type:

bool

cache_input_tensors#

Whether to cache input tensors for each compilation.

Type:

bool

kernel#

JITKernel instance for performance measurement.

Type:

tilelang.jit.kernel.JITKernel

supply_type#

Type of tensor supply mechanism.

Type:

tilelang.utils.tensor.TensorSupplyType

target#

Target platform (β€˜cuda’ or β€˜hip’).

Type:

Literal[β€˜cuda’, β€˜hip’]

atol: float#
cache_input_tensors: bool#
kernel: JITKernel#
max_mismatched_ratio: float#
out_idx: List[int]#
ref_prog: Callable#
rtol: float#
skip_check: bool#
supply_prog: Callable#
supply_type: TensorSupplyType#
target: Literal['cuda', 'hip']#
tilelang.autotuner.autotune(configs: Any, warmup: int = 25, rep: int = 100, timeout: int = 100) AutotuneResult#

Decorator for auto-tuning tilelang programs.

Parameters:
  • configs – Configuration space to explore during auto-tuning.

  • warmup – Number of warmup iterations before timing.

  • rep – Number of repetitions for timing measurements.

  • timeout – Maximum time (in seconds) allowed for each configuration.

Returns:

Decorated function that performs auto-tuning.

Return type:

Callable

tilelang.autotuner.check_tensor_list_compatibility(list1: List[torch.Tensor], list2: List[torch.Tensor]) bool#

Checks if two lists of tensors are compatible.

Compatibility checks performed include: 1. Lists have the same length. 2. Corresponding tensors have the same shape.

Parameters:
  • list1 – First list of tensors.

  • list2 – Second list of tensors.

tilelang.autotuner.get_available_cpu_count()#

Gets the number of CPU cores available to the current process.

tilelang.autotuner.jit(out_idx: Optional[List[int]] = None, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto') Callable#

Just-In-Time compilation decorator for tilelang programs.

Parameters:
  • out_idx – List of output tensor indices.

  • supply_type – Type of tensor supply mechanism. Ignored if supply_prog is provided.

  • ref_prog – Reference program for correctness validation.

  • supply_prog – Supply program for input tensors.

  • rtol – Relative tolerance for output validation.

  • atol – Absolute tolerance for output validation.

  • max_mismatched_ratio – Maximum allowed ratio of mismatched elements.

  • skip_check – Whether to skip validation checks.

  • cache_input_tensors – Whether to cache input tensors for each compilation.

  • target – Target platform (β€˜auto’, β€˜cuda’, or β€˜hip’).

Returns:

Decorated function that performs JIT compilation.

Return type:

Callable