tilelang.intrinsics.mma_sp_macro_generator¶

Attributes¶

Classes¶

SparseTensorCoreIntrinEmitter

To eliminate Python syntax within TIR Macro.

Module Contents¶

tilelang.intrinsics.mma_sp_macro_generator.lift¶
class tilelang.intrinsics.mma_sp_macro_generator.SparseTensorCoreIntrinEmitter(a_dtype='float16', e_dtype='uint8', b_dtype='float16', accum_dtype='float16', a_transposed=False, b_transposed=False, e_transposed=False, block_row_warps=2, block_col_warps=2, warp_row_tiles=8, warp_col_tiles=8, warp_k=16, reduce_k=1, num_elems_per_byte=1, is_m_first=False, thread_var=None)¶

To eliminate Python syntax within TIR Macro.

Parameters:
  • a_dtype (str)

  • e_dtype (str)

  • b_dtype (str)

  • accum_dtype (str)

  • a_transposed (bool)

  • b_transposed (bool)

  • e_transposed (bool)

  • block_row_warps (int)

  • block_col_warps (int)

  • warp_row_tiles (int)

  • warp_col_tiles (int)

  • warp_k (int)

  • reduce_k (int)

  • num_elems_per_byte (int)

  • is_m_first (bool)

  • thread_var (tvm.tir.Var | None)

M_DIM = 16¶
SPARSE_FACTOR = 2¶
SPARSE_SELECTOR = 0¶
n_dim = 16¶
WARP_SIZE = 32¶
dtype_abbrv¶
E_FACTOR_MAP¶
E_REPLICATE_FACTOR¶
is_m_first = False¶
a_dtype = 'float16'¶
e_dtype = 'uint8'¶
b_dtype = 'float16'¶
accum_dtype = 'float16'¶
a_transposed = False¶
b_transposed = False¶
e_transposed = False¶
block_row_warps = 2¶
block_col_warps = 2¶
warp_row_tiles = 8¶
warp_col_tiles = 8¶
warp_k = 16¶
e_factor = 8¶
reduce_k = 1¶
threads = 128¶
num_elems_per_byte = 1¶
thread_var = None¶
get_thread_binding()¶
get_store_index_map(inverse=False)¶
Parameters:

inverse (bool)

Return type:

tvm.tir.IndexMap

extract_thread_binding(thread_id, is_m_first=None)¶

is_m_first: True if the thread binding is in the form of (tx, warp_n, warp_m) which represents [warp_size, block_row_warps (split n), block_col_warps (split m)] Otherwise, it is in the form of [warp_size, block_col_warps (split m), block_row_warps (split n)]

Parameters:
  • thread_id (tvm.tir.PrimExpr)

  • is_m_first (bool | None)

Return type:

tuple[tvm.tir.PrimExpr, tvm.tir.PrimExpr, tvm.tir.PrimExpr]

ldmatrix_a(A_local_buf, A_shared_buf, ki, rk=0)¶
Parameters:
  • A_local_buf (tvm.tir.Buffer)

  • A_shared_buf (tvm.tir.Buffer)

  • ki (tvm.tir.PrimExpr)

  • rk (tvm.tir.PrimExpr)

ldmatrix_e(E_local_buf, E_shared_buf, ki, rk=0)¶
Parameters:
  • E_local_buf (tvm.tir.Buffer)

  • E_shared_buf (tvm.tir.Buffer)

  • ki (tvm.tir.PrimExpr)

  • rk (tvm.tir.PrimExpr)

ldmatrix_b(B_local_buf, B_shared_buf, ki, rk=0)¶
Parameters:
  • B_local_buf (tvm.tir.Buffer)

  • B_shared_buf (tvm.tir.Buffer)

  • ki (tvm.tir.PrimExpr)

  • rk (tvm.tir.PrimExpr)

mma_sp(A_local_buf, E_local_buf, B_local_buf, C_local_buf, k_inner=0)¶
Parameters:
  • A_local_buf (tvm.tir.Buffer)

  • E_local_buf (tvm.tir.Buffer)

  • B_local_buf (tvm.tir.Buffer)

  • C_local_buf (tvm.tir.Buffer)

  • k_inner (tvm.tir.PrimExpr)

stmatrix(C_local_buf, C_buf, pid_m=None, pid_n=None)¶
make_mma_load_layout(local_buf, matrix='A')¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

Parameters:
  • local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.

  • matrix (Literal['A', 'B'])

Returns:

A fragment object that describes how threads and indices in local_buf are laid out.

Return type:

T.Fragment

Raises:

AssertionError – If local_buf is not detected to be a fragment buffer.

make_mma_store_layout(local_buf)¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

Parameters:

local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.

Returns:

A fragment object that describes how threads and indices in local_buf are laid out.

Return type:

T.Fragment

Raises:

AssertionError – If local_buf is not detected to be a fragment buffer.