tilelang.transform¶

Wrapping transformations.

Submodules¶

Functions¶

get_pass_context()

Get the current pass context

ClusterPlanning()

ClusterPlanning

PipelinePlanning()

infer the fragment/shared memory layout

LayoutInference()

LayoutInference

LowerTileOp()

LowerTileOp

InjectSoftwarePipeline()

InjectSoftwarePipeline

FrontendLegalize()

FrontendLegalize

LegalizeNegativeIndex()

Legalize negative indices in buffer loads.

InjectAssumes()

Inject Assumes

LowerHopperIntrin()

LowerHopperIntrin

WarpSpecializedPipeline()

WarpSpecializedPipeline

RewriteWgmmaSync()

RewriteWgmmaSync

ThreadSync(storage_scope)

Insert sync between parallel read/write of shared buffers.

ThreadPartialSync(storage_scope)

Insert partial sync.

IfStmtBinding()

IfStmtBinding

MergeIfStmt()

MergeIfStmt

MultiVersionBuffer()

WarpSpecializedPipeline

WarpSpecialized()

WarpSpecializedPipeline

AnnotateWarpGroupRegAlloc()

Inject set_max_nreg calls into warp-specialized functions.

InjectTmaBarrier()

InjectTmaBarrier

InjectFenceProxy()

InjectFenceProxy

LegalizeVectorizedLoop()

LegalizeLoopVectorize

LegalizeSafeMemoryAccess()

LegalizeLoopVectorize

MakePackedAPI()

MakePackedAPI

AnnotateDeviceRegions()

AnnotateDeviceRegions

SplitHostDevice()

Split host/device functions even for empty kernels.

VectorizeLoop([enable_vectorize])

VectorizeLoop

InjectPTXAsyncCopy()

Rewrite global to shared memory copy on CUDA with asynchronous copy.

LowerDeviceStorageAccessInfo()

Lower attached storage access information on device.

ConfigIndexBitwidth()

Config index bitwidth.

FlattenBuffer()

FlattenBuffer

EliminateStorageSyncForMBarrier()

EliminateStorageSyncForMBarrier

MergeSharedMemoryAllocations([...])

MergeSharedMemoryAllocations

LowerL2Persistent()

LowerL2Persistent

PersistThreadblock()

PersistThreadblock

AlignDynamicSharedMemoryAllocations([align_bytes])

AlignDynamicSharedMemoryAllocations

LowerSharedBarrier()

LowerSharedBarrier

StorageRewrite()

StorageRewrite

LowerOpaqueBlock()

LowerOpaqueBlock

LowerThreadAllreduce()

LowerThreadAllreduce

LowerIntrin()

LowerIntrin

LowerDeviceKernelLaunch()

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

LowerSharedTmem()

LowerSharedTmem

LayoutReducer()

Return a TVM transform pass that performs layout reduction/normalization.

Package Contents¶

tilelang.transform.get_pass_context()¶

Get the current pass context

tilelang.transform.ClusterPlanning()¶

ClusterPlanning

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.PipelinePlanning()¶

infer the fragment/shared memory layout

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LayoutInference()¶

LayoutInference

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerTileOp()¶

LowerTileOp

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectSoftwarePipeline()¶

InjectSoftwarePipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.FrontendLegalize()¶

FrontendLegalize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeNegativeIndex()¶

Legalize negative indices in buffer loads.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectAssumes()¶

Inject Assumes

Returns:¶

fpasstvm.transform.Pass

The result pass

tilelang.transform.LowerHopperIntrin()¶

LowerHopperIntrin

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.WarpSpecializedPipeline()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.RewriteWgmmaSync()¶

RewriteWgmmaSync

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.ThreadSync(storage_scope)¶

Insert sync between parallel read/write of shared buffers.

Parameters:

storage_scope (str) – The target storage scope.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.ThreadPartialSync(storage_scope)¶

Insert partial sync.

Parameters:

storage_scope (str) – The target storage scope.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.IfStmtBinding()¶

IfStmtBinding

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MergeIfStmt()¶

MergeIfStmt

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MultiVersionBuffer()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.WarpSpecialized()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateWarpGroupRegAlloc()¶

Inject set_max_nreg calls into warp-specialized functions.

This pass analyzes the function to collect register hints from set_max_nreg and no_set_max_nreg calls, then injects appropriate set_max_nreg calls into producer and consumer branches of warp-specialized code.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectTmaBarrier()¶

InjectTmaBarrier

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectFenceProxy()¶

InjectFenceProxy

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeVectorizedLoop()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeSafeMemoryAccess()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MakePackedAPI()¶

MakePackedAPI

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateDeviceRegions()¶

AnnotateDeviceRegions

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.SplitHostDevice()¶

Split host/device functions even for empty kernels.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.VectorizeLoop(enable_vectorize=True)¶

VectorizeLoop

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:

enable_vectorize (bool)

tilelang.transform.InjectPTXAsyncCopy()¶

Rewrite global to shared memory copy on CUDA with asynchronous copy.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerDeviceStorageAccessInfo()¶

Lower attached storage access information on device.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Note

Run this pass after all storage access analysis finish.

tilelang.transform.ConfigIndexBitwidth()¶

Config index bitwidth.

Returns:

  • fpass (tvm.transform.Pass) – The result pass

  • —-

tilelang.transform.FlattenBuffer()¶

FlattenBuffer

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.EliminateStorageSyncForMBarrier()¶

EliminateStorageSyncForMBarrier

tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=False, align_bytes=16)¶

MergeSharedMemoryAllocations

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:
  • enable_aggressive_merge (bool)

  • align_bytes (int)

tilelang.transform.LowerL2Persistent()¶

LowerL2Persistent

tilelang.transform.PersistThreadblock()¶

PersistThreadblock

tilelang.transform.AlignDynamicSharedMemoryAllocations(align_bytes=16)¶

AlignDynamicSharedMemoryAllocations

Parameters:

align_bytes (int) – The alignment bytes.

tilelang.transform.LowerSharedBarrier()¶

LowerSharedBarrier

tilelang.transform.StorageRewrite()¶

StorageRewrite

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerOpaqueBlock()¶

LowerOpaqueBlock

tilelang.transform.LowerThreadAllreduce()¶

LowerThreadAllreduce

tilelang.transform.LowerIntrin()¶

LowerIntrin

tilelang.transform.LowerDeviceKernelLaunch()¶

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

This pass transforms high-level device kernel launch and related intrinsics into lower-level IR suitable for backend code generation and device-side lowering.

Returns:

The transform pass that performs device kernel launch lowering.

Return type:

tvm.transform.Pass

tilelang.transform.LowerSharedTmem()¶

LowerSharedTmem

tilelang.transform.LayoutReducer()¶

Return a TVM transform pass that performs layout reduction/normalization.

This wrapper delegates to the underlying FFI implementation and returns a pass object suitable for use in a PassContext or pass pipeline. The pass is intended to simplify or reduce tensor/layout-related representations during relay/tile transformations.

Returns:

The transform pass object produced by the FFI backend.