tilelang.intrinsics.wgmma_macro_generator¶
Attributes¶
Classes¶
Enum where members are also (and must be) ints |
|
To eliminate Python syntax within TIR Macro. |
Module Contents¶
- tilelang.intrinsics.wgmma_macro_generator.lift¶
- class tilelang.intrinsics.wgmma_macro_generator.SwizzleMode¶
Bases:
enum.IntEnumEnum where members are also (and must be) ints
- NONE = 0¶
- SWIZZLE_128B = 1¶
- SWIZZLE_64B = 2¶
- SWIZZLE_32B = 3¶
- swizzle_byte_size()¶
- Return type:
int
- swizzle_atom_size()¶
- Return type:
int
- class tilelang.intrinsics.wgmma_macro_generator.TensorCoreIntrinEmitter(a_dtype='float16', b_dtype='float16', accum_dtype='float16', a_transposed=False, b_transposed=False, block_row_warps=2, block_col_warps=2, warp_row_tiles=8, warp_col_tiles=8, chunk=16, reduce_k=1, num_elems_per_byte=1, is_m_first=False, thread_var=None)¶
Bases:
tilelang.intrinsics.mma_macro_generator.TensorCoreIntrinEmitterTo eliminate Python syntax within TIR Macro.
- Parameters:
- wgmma_prefix: str¶
- wgmma_inst_m: int¶
- wgmma_inst_n: int¶
- wgmma(A_region, B_region, C_region, clear_accum=False, wg_wait=0)¶
- Parameters:
A_region (tvm.tir.BufferRegion)
B_region (tvm.tir.BufferRegion)
C_region (tvm.tir.BufferRegion)
clear_accum (tvm.tir.PrimExpr)
wg_wait (int)
- wgmma_rs(A_region, B_region, C_region, clear_accum=False, wg_wait=0)¶
- Parameters:
A_region (tvm.tir.BufferRegion)
B_region (tvm.tir.BufferRegion)
C_region (tvm.tir.BufferRegion)
clear_accum (tvm.tir.PrimExpr)
wg_wait (int)
- make_mma_load_layout(local_buf, matrix='A')¶
Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.
- Parameters:
local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.
matrix (str)
- Returns:
A fragment object that describes how threads and indices in local_buf are laid out.
- Return type:
T.Fragment
- Raises:
AssertionError – If local_buf is not detected to be a fragment buffer.
- make_mma_store_layout(local_buf)¶
Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.
- Parameters:
local_buf (tir.Buffer) – The local buffer representing a fragment of a matrix.
- Returns:
A fragment object that describes how threads and indices in local_buf are laid out.
- Return type:
T.Fragment
- Raises:
AssertionError – If local_buf is not detected to be a fragment buffer.