tilelang.autotuner.tuner
========================

.. py:module:: tilelang.autotuner.tuner

.. autoapi-nested-parse::

   The auto-tune module for tilelang programs.

   This module provides functionality for auto-tuning tilelang programs, including JIT compilation
   and performance optimization through configuration search.


Attributes
----------

.. autoapisummary::

   tilelang.autotuner.tuner.logger


Exceptions
----------

.. autoapisummary::

   tilelang.autotuner.tuner.TimeoutException


Classes
-------

.. autoapisummary::

   tilelang.autotuner.tuner.AutoTuner


Functions
---------

.. autoapisummary::

   tilelang.autotuner.tuner.timeout_handler
   tilelang.autotuner.tuner.run_with_timeout
   tilelang.autotuner.tuner.get_available_cpu_count
   tilelang.autotuner.tuner.autotune


Module Contents
---------------

.. py:exception:: TimeoutException

   Bases: :py:obj:`Exception`


   Common base class for all non-exit exceptions.


.. py:function:: timeout_handler(signum, frame)

.. py:function:: run_with_timeout(func, timeout, *args, **kwargs)

.. py:data:: logger

.. py:function:: get_available_cpu_count()

   Gets the number of CPU cores available to the current process.


.. py:class:: AutoTuner(fn, configs)

   Auto-tuner for tilelang programs.

   This class handles the auto-tuning process by testing different configurations
   and finding the optimal parameters for program execution.

   :param fn: The function to be auto-tuned.
   :param configs: List of configurations to try during auto-tuning.


   .. py:attribute:: compile_args


   .. py:attribute:: profile_args


   .. py:attribute:: cache_dir
      :type:  pathlib.Path


   .. py:attribute:: fn


   .. py:attribute:: configs


   .. py:attribute:: ref_latency_cache
      :value: None


   .. py:attribute:: jit_input_tensors
      :value: None


   .. py:attribute:: ref_input_tensors
      :value: None


   .. py:attribute:: jit_compile
      :value: None


   .. py:method:: from_kernel(kernel, configs)
      :classmethod:


      Create an AutoTuner instance from a kernel function.

      :param kernel: The kernel function to auto-tune.
      :param configs: List of configurations to try.

      :returns: A new AutoTuner instance.
      :rtype: AutoTuner


   .. py:method:: set_compile_args(out_idx = None, target = 'auto', execution_backend = 'cython', target_host = None, verbose = False, pass_configs = None)

      Set compilation arguments for the auto-tuner.

      :param out_idx: List of output tensor indices.
      :param target: Target platform.
      :param execution_backend: Execution backend to use for kernel execution.
      :param target_host: Target host for cross-compilation.
      :param verbose: Whether to enable verbose output.
      :param pass_configs: Additional keyword arguments to pass to the Compiler PassContext.

      :returns: Self for method chaining.
      :rtype: AutoTuner


   .. py:method:: set_profile_args(warmup = 25, rep = 100, timeout = 30, supply_type = tilelang.TensorSupplyType.Auto, ref_prog = None, supply_prog = None, rtol = 0.01, atol = 0.01, max_mismatched_ratio = 0.01, skip_check = False, manual_check_prog = None, cache_input_tensors = False)

      Set profiling arguments for the auto-tuner.

      :param supply_type: Type of tensor supply mechanism. Ignored if `supply_prog` is provided.
      :param ref_prog: Reference program for validation.
      :param supply_prog: Supply program for input tensors.
      :param rtol: Relative tolerance for validation.
      :param atol: Absolute tolerance for validation.
      :param max_mismatched_ratio: Maximum allowed mismatch ratio.
      :param skip_check: Whether to skip validation.
      :param manual_check_prog: Manual check program for validation.
      :param cache_input_tensors: Whether to cache input tensors.
      :param warmup: Number of warmup iterations.
      :param rep: Number of repetitions for timing.
      :param timeout: Maximum time per configuration.

      :returns: Self for method chaining.
      :rtype: AutoTuner


   .. py:method:: set_kernel_parameters(parameters)


   .. py:method:: generate_cache_key(parameters)

      Generate a cache key for the auto-tuning process.


   .. py:method:: run(warmup = 25, rep = 100, timeout = 30)

      Run the auto-tuning process.

      :param warmup: Number of warmup iterations.
      :param rep: Number of repetitions for timing.
      :param timeout: Maximum time per configuration.

      :returns: Results of the auto-tuning process.
      :rtype: AutotuneResult


   .. py:method:: __call__()

      Make the AutoTuner callable, running the auto-tuning process.

      :returns: Results of the auto-tuning process.
      :rtype: AutotuneResult


.. py:function:: autotune(func = None, *, configs, warmup = 25, rep = 100, timeout = 100, supply_type = tilelang.TensorSupplyType.Auto, ref_prog = None, supply_prog = None, rtol = 0.01, atol = 0.01, max_mismatched_ratio = 0.01, skip_check = False, manual_check_prog = None, cache_input_tensors = False)

   Just-In-Time (JIT) compiler decorator for TileLang functions.

   This decorator can be used without arguments (e.g., `@tilelang.jit`):
      Applies JIT compilation with default settings.

   Tips:
       - If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.
           ```python
               if enable_autotune:
                   kernel = flashattn(batch, heads, seq_len, dim, is_causal)
               else:
                   kernel = flashattn(
                       batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)
           ```

   :param func_or_out_idx: If using `@tilelang.jit(...)` to configure, this is the `out_idx` parameter.
                           If using `@tilelang.jit` directly on a function, this argument is implicitly
                           the function to be decorated (and `out_idx` will be `None`).
   :type func_or_out_idx: Any, optional
   :param configs: Configuration space to explore during auto-tuning.
   :type configs: Dict or Callable
   :param warmup: Number of warmup iterations before timing.
   :type warmup: int, optional
   :param rep: Number of repetitions for timing measurements.
   :type rep: int, optional
   :param timeout:
   :type timeout: int, optional
   :param target: Compilation target for TVM (e.g., "cuda", "llvm"). Defaults to "auto".
   :type target: Union[str, Target], optional
   :param target_host: Target host for cross-compilation. Defaults to None.
   :type target_host: Union[str, Target], optional
   :param execution_backend: Backend for kernel execution and argument passing. Defaults to "cython".
   :type execution_backend: Literal["dlpack", "ctypes", "cython"], optional
   :param verbose: Enables verbose logging during compilation. Defaults to False.
   :type verbose: bool, optional
   :param pass_configs: Configurations for TVM's pass context. Defaults to None.
   :type pass_configs: Optional[Dict[str, Any]], optional
   :param debug_root_path: Directory to save compiled kernel source for debugging. Defaults to None.
   :type debug_root_path: Optional[str], optional

   :returns: Either a JIT-compiled wrapper around the input function, or a configured decorator
             instance that can then be applied to a function.
   :rtype: Callable