tilelang.intrinsics.utils¶

Functions¶

get_ldmatrix_offset(matrix, row_idx, col_idx, stride)

shared_16x16_to_mma_32x8_layout(i, j)

shared_16x32_to_mma_32x16_layout(i, j)

shared_32x16_to_mma_32x16_layout(i, j)

mma_store_index_map(thread_id, local_id)

mfma_store_index_map(thread_id, local_id)

get_mma_micro_size(dtype)

index_to_coordinates(index, shape)

General Implementation of:

Module Contents¶

tilelang.intrinsics.utils.get_ldmatrix_offset(matrix, row_idx, col_idx, stride, dtype='float16', transposed=False)¶
Parameters:
  • matrix (Literal['A', 'B'])

  • dtype (Literal['float16', 'int8'])

  • transposed (bool)

tilelang.intrinsics.utils.shared_16x16_to_mma_32x8_layout(i, j)¶
tilelang.intrinsics.utils.shared_16x32_to_mma_32x16_layout(i, j)¶
tilelang.intrinsics.utils.shared_32x16_to_mma_32x16_layout(i, j)¶
tilelang.intrinsics.utils.mma_store_index_map(thread_id, local_id)¶
tilelang.intrinsics.utils.mfma_store_index_map(thread_id, local_id)¶
tilelang.intrinsics.utils.get_mma_micro_size(dtype)¶
Parameters:

dtype (Literal['float16', 'int8'])

tilelang.intrinsics.utils.index_to_coordinates(index, shape)¶
General Implementation of:

vjj = index % (micro_size_k // num_elems_per_byte) coordinates[-1] = index % shape[-1]; vii = index // (micro_size_k // num_elems_per_byte) % micro_size_y index = index // shape[-1]; coordinates[-2] = index % shape[-2]; vj = index // (micro_size_k // num_elems_per_byte * micro_size_y) % block_K // (micro_size_k // num_elems_per_byte) index = index // shape[-2]; coordinates[-3] = index % shape[-3]; vi = index // (micro_size_k // num_elems_per_byte * micro_size_y * (block_K // (micro_size_k // num_elems_per_byte))) % block_N // micro_size_y index = index // shape[-3]; coordinates[-4] = index % shape[-4];