tilelang.quantize.utils module#

tilelang.quantize.utils.general_compress(lowprecision_weight, source_bits=4, storage_dtype=None)#

tilelang.quantize.utils.interleave_weight(qweight, nbits=4, target_dtype='float16')#

Interleave the weight to the target data type.

Parameters:

Returns:

_description_

Return type:

_type_

Example

qweight = torch.randint(0, 127, (10, 10), dtype=torch.int8).cuda() interleave_weight(qweight, 4, “float16”)