tilelang.intrinsics.wgmma_macro_generator¶

Attributes¶

Classes¶

SwizzleMode

Enum where members are also (and must be) ints

TensorCoreIntrinEmitter

To eliminate Python syntax within TIR Macro.

Module Contents¶

tilelang.intrinsics.wgmma_macro_generator.lift¶
class tilelang.intrinsics.wgmma_macro_generator.SwizzleMode¶

Bases: enum.IntEnum

Enum where members are also (and must be) ints

NONE = 0¶
SWIZZLE_128B = 1¶
SWIZZLE_64B = 2¶
SWIZZLE_32B = 3¶
is_none()¶
Return type:

bool

is_swizzle_32b()¶
Return type:

bool

is_swizzle_64b()¶
Return type:

bool

is_swizzle_128b()¶
Return type:

bool

swizzle_byte_size()¶
Return type:

int

swizzle_atom_size()¶
Return type:

int

class tilelang.intrinsics.wgmma_macro_generator.TensorCoreIntrinEmitter(a_dtype='float16', b_dtype='float16', accum_dtype='float16', a_transposed=False, b_transposed=False, block_row_warps=2, block_col_warps=2, warp_row_tiles=8, warp_col_tiles=8, chunk=16, reduce_k=1, num_elems_per_byte=1, is_m_first=False, thread_var=None)¶

Bases: tilelang.intrinsics.mma_macro_generator.TensorCoreIntrinEmitter

To eliminate Python syntax within TIR Macro.

Parameters:
  • a_dtype (str)

  • b_dtype (str)

  • accum_dtype (str)

  • a_transposed (bool)

  • b_transposed (bool)

  • block_row_warps (int)

  • block_col_warps (int)

  • warp_row_tiles (int)

  • warp_col_tiles (int)

  • chunk (int)

  • reduce_k (int)

  • num_elems_per_byte (int)

  • is_m_first (Optional[bool])

  • thread_var (Optional[tvm.tir.Var])

wgmma_prefix: str¶
a_shared_layout: tilelang.layout.Layout = None¶
b_shared_layout: tilelang.layout.Layout = None¶
wgmma(A_buf, B_buf, C_local_buf, clear_accum=False)¶
Parameters:
  • A_buf (tvm.tir.Buffer)

  • B_buf (tvm.tir.Buffer)

  • C_local_buf (tvm.tir.Buffer)

  • clear_accum (tvm.tir.PrimExpr)

wgmma_rs(A_buf, B_buf, C_local_buf, clear_accum=False)¶
Parameters:
  • A_buf (tvm.tir.Buffer)

  • B_buf (tvm.tir.Buffer)

  • C_local_buf (tvm.tir.Buffer)

  • clear_accum (tvm.tir.PrimExpr)

make_mma_load_layout(local_buf, matrix='A')¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

Parameters:
  • local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.

  • matrix (str)

Returns:

A fragment object that describes how threads and indices in local_buf are laid out.

Return type:

T.Fragment

Raises:

AssertionError – If local_buf is not detected to be a fragment buffer.

make_mma_store_layout(local_buf)¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

Parameters:

local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.

Returns:

A fragment object that describes how threads and indices in local_buf are laid out.

Return type:

T.Fragment

Raises:

AssertionError – If local_buf is not detected to be a fragment buffer.