tilelang.autotuner.param

The auto-tune parameters.

Attributes

Classes

CompileArgs

Compile arguments for the auto-tuner. Detailed description can be found in tilelang.jit.compile.

ProfileArgs

Profile arguments for the auto-tuner.

AutotuneResult

Results from auto-tuning process.

Module Contents

tilelang.autotuner.param.BEST_CONFIG_PATH = 'best_config.json'
tilelang.autotuner.param.FUNCTION_PATH = 'function.pkl'
tilelang.autotuner.param.LATENCY_PATH = 'latency.json'
tilelang.autotuner.param.KERNEL_PATH = 'kernel.cu'
tilelang.autotuner.param.WRAPPED_KERNEL_PATH = 'wrapped_kernel.cu'
tilelang.autotuner.param.KERNEL_LIB_PATH = 'kernel_lib.so'
tilelang.autotuner.param.PARAMS_PATH = 'params.pkl'
class tilelang.autotuner.param.CompileArgs

Compile arguments for the auto-tuner. Detailed description can be found in tilelang.jit.compile. .. attribute:: out_idx

List of output tensor indices.

execution_backend

Execution backend to use for kernel execution (default: “cython”).

target

Compilation target, either as a string or a TVM Target object (default: “auto”).

target_host

Target host for cross-compilation (default: None).

verbose

Whether to enable verbose output (default: False).

pass_configs

Additional keyword arguments to pass to the Compiler PassContext.

Available options

“tir.disable_vectorize”: bool, default: False “tl.disable_tma_lower”: bool, default: False “tl.disable_warp_specialized”: bool, default: False “tl.config_index_bitwidth”: int, default: None “tl.disable_dynamic_tail_split”: bool, default: False “tl.dynamic_vectorize_size_bits”: int, default: 128 “tl.disable_safe_memory_legalize”: bool, default: False

out_idx: List[int] | int | None = None
execution_backend: Literal['dlpack', 'ctypes', 'cython'] = 'cython'
target: Literal['auto', 'cuda', 'hip'] = 'auto'
target_host: str | tvm.target.Target = None
verbose: bool = False
pass_configs: Dict[str, Any] | None = None
compile_program(program)
Parameters:

program (tvm.tir.PrimFunc)

__hash__()
class tilelang.autotuner.param.ProfileArgs

Profile arguments for the auto-tuner.

warmup

Number of warmup iterations.

rep

Number of repetitions for timing.

timeout

Maximum time per configuration.

supply_type

Type of tensor supply mechanism.

ref_prog

Reference program for correctness validation.

supply_prog

Supply program for input tensors.

out_idx

Union[List[int], int] = -1

supply_type

tilelang.TensorSupplyType = tilelang.TensorSupplyType.Auto

ref_prog

Callable = None

supply_prog

Callable = None

rtol

float = 1e-2

atol

float = 1e-2

max_mismatched_ratio

float = 0.01

skip_check

bool = False

manual_check_prog

Callable = None

cache_input_tensors

bool = True

warmup: int = 25
rep: int = 100
timeout: int = 30
supply_type: tilelang.TensorSupplyType
ref_prog: Callable = None
supply_prog: Callable = None
rtol: float = 0.01
atol: float = 0.01
max_mismatched_ratio: float = 0.01
skip_check: bool = False
manual_check_prog: Callable = None
cache_input_tensors: bool = True
__hash__()
class tilelang.autotuner.param.AutotuneResult

Results from auto-tuning process.

latency

Best achieved execution latency.

config

Configuration that produced the best result.

ref_latency

Reference implementation latency.

libcode

Generated library code.

func

Optimized function.

kernel

Compiled kernel function.

latency: float | None = None
config: dict | None = None
ref_latency: float | None = None
libcode: str | None = None
func: Callable | None = None
kernel: Callable | None = None
save_to_disk(path, verbose=False)
Parameters:
  • path (pathlib.Path)

  • verbose (bool)

classmethod load_from_disk(path, compile_args)
Parameters:
Return type:

AutotuneResult