tilelang.carver.arch.cuda¶
Attributes¶
Classes¶
Represents the architecture of a computing device, capturing various hardware specifications. |
Functions¶
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Module Contents¶
- tilelang.carver.arch.cuda.check_sm_version(arch)¶
- Parameters:
arch (str)
- Return type:
int
- tilelang.carver.arch.cuda.is_cuda_arch(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.is_volta_arch(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.is_ampere_arch(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.is_ada_arch(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.is_hopper_arch(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.has_mma_support(arch)¶
- Parameters:
- Return type:
bool
- tilelang.carver.arch.cuda.volta_tensorcore_supported = [('float16', 'float32'), ('float16', 'float16')]¶
- tilelang.carver.arch.cuda.ampere_tensorcore_supported = [('bfloat16', 'float32'), ('float16', 'float32'), ('float16', 'float16'), ('int8', 'int32'),...¶
- tilelang.carver.arch.cuda.ada_tensorcore_supported = [('bfloat16', 'float32'), ('float16', 'float32'), ('float16', 'float16'), ('int8', 'int32'),...¶
- tilelang.carver.arch.cuda.hopper_tensorcore_supported = [('bfloat16', 'float32'), ('float16', 'float32'), ('float16', 'float16'), ('int8', 'int32'),...¶
- tilelang.carver.arch.cuda.is_tensorcore_supported_precision(in_dtype, accum_dtype, arch)¶
- Parameters:
in_dtype (str)
accum_dtype (str)
- Return type:
bool
- class tilelang.carver.arch.cuda.TensorInstruction(name, shape)¶
Bases:
object
- Parameters:
name (str)
shape (List[int])
- name: str¶
- shape: List[int]¶
- class tilelang.carver.arch.cuda.CUDA(target)¶
Bases:
tilelang.carver.arch.arch_base.TileDevice
Represents the architecture of a computing device, capturing various hardware specifications.
- Parameters:
target (Union[tvm.target.Target, str])
- target¶
- sm_version¶
- name¶
- device: tvm.runtime.Device¶
- platform: str = 'CUDA'¶
- smem_cap¶
- compute_max_core¶
- warp_size¶
- compute_capability¶
- reg_cap: int = 65536¶
- max_smem_usage: int¶
- sm_partition: int = 4¶
- l2_cache_size_bytes: int¶
- transaction_size: List[int] = [32, 128]¶
- bandwidth: List[int] = [750, 12080]¶
- available_tensor_instructions: List[TensorInstruction] = None¶
- get_avaliable_tensorintrin_shapes()¶
- __repr__()¶