tilelang.language.builtin module#

The language interface for tl programs.

tilelang.language.builtin.barrier_arrive(barrier_id: Union[int, PrimExpr, Call])#

Arrive at a memory barrier.

Parameters:

barrier_id – Optional[int, PrimExpr] The memory barrier to arrive at

tilelang.language.builtin.barrier_wait(barrier_id: Union[int, PrimExpr, Call], parity: Optional[Union[int, Var]] = None)#

Wait for a memory barrier to complete.

Parameters:
  • barrier_id – Optional[int, PrimExpr] The memory barrier to wait on

  • parity – Optional[int, Var] The parity value to wait for

Returns:

A handle to the barrier wait operation

Return type:

tir.Call

Current implementation is a sugar syntax for mbarrier_wait_parity, as we only support parity 0 and 1.

tilelang.language.builtin.create_list_of_mbarrier(*args: Any) Call#

Create a list of memory barrier handles.

Parameters:

*args (list or Any) – Either a single list of arguments, or multiple arguments directly.

Returns:

Handle to the created list of memory barriers.

Return type:

tvm.tir.Call

Raises:

TypeError – If the input is not a list or variadic arguments.

Examples

>>> create_list_of_mbarrier([128, 128])
>>> create_list_of_mbarrier(128, 128)
tilelang.language.builtin.create_tma_descriptor(*args)#

Create a Tensor Memory Access (TMA) descriptor.

Parameters:

*args – Variable arguments defining the TMA descriptor configuration

Returns:

A handle to the created TMA descriptor

Return type:

tir.Call

tilelang.language.builtin.dec_max_nreg(reg_count: int)#

Decrement the maximum number of registers to use.

tilelang.language.builtin.fence_proxy_async(*args)#

Create a fence for asynchronous proxy operations.

Parameters:

*args – Variable arguments for fence configuration

Returns:

A handle to the fence operation

Return type:

tir.Call

tilelang.language.builtin.get_mbarrier(*args)#

Retrieve a memory barrier operation.

Parameters:

*args – Variable arguments to specify which memory barrier to retrieve

Returns:

A handle to the requested memory barrier

Return type:

tir.Call

tilelang.language.builtin.inc_max_nreg(reg_count: int)#

Increment the maximum number of registers to use.

tilelang.language.builtin.mbarrier_arrive(mbarrier: Union[int, PrimExpr, Call])#

Arrive at memory barrier.

Parameters:

mbarrier – Optional[int, PrimExpr] The memory barrier to arrive at

tilelang.language.builtin.mbarrier_expect_tx(*args)#

Set expected transaction count for memory barrier.

Parameters:

*args – Variable arguments specifying the expected transaction count

Returns:

A handle to the barrier expectation operation

Return type:

tir.Call

tilelang.language.builtin.mbarrier_wait_parity(mbarrier: Union[int, PrimExpr, Call], parity: Union[int, Var])#

Wait for memory barrier parity condition.

Parameters:
  • mbarrier – Optional[int, PrimExpr] The memory barrier to wait on

  • parity – Optional[int, Var] The parity value to wait for

Examples

# Wait for parity 0 on barrier 0
T.mbarrier_wait_parity(0, 0)

# Wait for parity value in variable ko on barrier 1
T.mbarrier_wait_parity(1, ko)

# Wait using barrier handle
barrier = T.get_mbarrier(0)
T.mbarrier_wait_parity(barrier, 1)

# Common usage in pipelined kernels:
for ko in range(num_stages):
    # Producer waits for consumer to finish previous iteration
    T.mbarrier_wait_parity(1, ko ^ 1)
    # Producer copies data
    T.copy(A_global, A_shared)
    # Producer signals data ready
    T.mbarrier_arrive(0)

    # Consumer waits for producer data
    T.mbarrier_wait_parity(0, ko)
    # Consumer computes
    T.gemm(A_shared, B_shared, C_local)
    # Consumer signals completion
    T.mbarrier_arrive(1)
Returns:

A handle to the barrier wait operation

Return type:

tir.Call

tilelang.language.builtin.no_set_max_nreg()#

Disable the maximum register limit setting.

tilelang.language.builtin.set_max_nreg(reg_count: int, is_inc: int)#

Set the maximum number of registers to use. Detailed Documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/#miscellaneous-instructions-setmaxnreg

Parameters:
  • reg_count – int The number of registers to allocate

  • is_inc – int Whether to increment or decrement the register count 0 if decrement, 1 if increment

Returns:

A handle to the register setting operation

Return type:

tir.Call

tilelang.language.builtin.shfl_down(value: Union[int, PrimExpr, Call], offset: Union[int, PrimExpr, Call])#

Perform a shuffle operation with down offset.

Parameters:

value – Optional[int, PrimExpr] The value to shuffle

tilelang.language.builtin.shfl_up(value: Union[int, PrimExpr, Call], offset: Union[int, PrimExpr, Call])#

Perform a shuffle operation with up offset.

Parameters:

value – Optional[int, PrimExpr] The value to shuffle

tilelang.language.builtin.shfl_xor(value: Union[int, PrimExpr, Call], offset: Union[int, PrimExpr, Call])#

Perform a shuffle operation with XOR offset.

Parameters:
  • value – Optional[int, PrimExpr] The value to shuffle

  • offset – Optional[int, PrimExpr] The offset for the shuffle operation

Returns:

A handle to the shuffle operation

Return type:

tir.Call

tilelang.language.builtin.sync_global()#

Synchronize all threads in a block.

tilelang.language.builtin.sync_thread_partial(barrier_id: Union[int, PrimExpr, Call])#

Synchronize threads within a warp.

Parameters:

barrier_id – Optional[int, PrimExpr] The memory barrier to synchronize

Returns:

A handle to the synchronization operation

Return type:

tir.Call

tilelang.language.builtin.sync_threads()#

Synchronize all threads in a warp.

tilelang.language.builtin.tma_load(*args)#

Perform a Tensor Memory Access (TMA) load operation.

Parameters:

*args – Variable arguments specifying the TMA load parameters

Returns:

A handle to the TMA load operation

Return type:

tir.Call

tilelang.language.builtin.tma_store_arrive(*args)#

Signal the arrival of a TMA store operation.

Parameters:

*args – Variable arguments for the store arrival operation

Returns:

A handle to the store arrive operation

Return type:

tir.Call

tilelang.language.builtin.tma_store_wait(*args)#

Wait for completion of TMA store operations.

Parameters:

*args – Variable arguments specifying which store operations to wait for

Returns:

A handle to the store wait operation

Return type:

tir.Call

tilelang.language.builtin.wait_wgmma(*args)#

Wait for WGMMA (Warp Group Matrix Multiply-Accumulate) operations to complete.

Parameters:

*args – Variable arguments specifying which operations to wait for

Returns:

A handle to the WGMMA wait operation

Return type:

tir.Call