[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes:
a pass
IPCAllreduceRewrite
which rewrites"runtime.disco.allreduce"
to"runtime.disco.cuda_ipc.custom_allreduce"
, and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly.memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently.
a pass
LowerGPUIPCAllocStorage
that rewrites the storage allocation of IPC memory from builtin ops to calls to function"runtime.disco.cuda_ipc.alloc_storage"
.supports the op
relax.builtin.alloc_tensor
with storage scope. The default storage scope is"global"
.We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.