[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759

MasterJH5574 · 2024-03-20T23:43:45Z

This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes:

a pass IPCAllreduceRewrite which rewrites "runtime.disco.allreduce" to "runtime.disco.cuda_ipc.custom_allreduce", and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly.
memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently.
a pass LowerGPUIPCAllocStorage that rewrites the storage allocation of IPC memory from builtin ops to calls to function "runtime.disco.cuda_ipc.alloc_storage".
supports the op relax.builtin.alloc_tensor with storage scope. The default storage scope is "global".

We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.

This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes: 1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"` to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly. 2. memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently. 3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`. 4. supports the op `relax.builtin.alloc_tensor` with storage scope. The default storage scope is `"global"`. We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.

…e#16759) This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes: 1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"` to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly. 2. memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently. 3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`. 4. supports the op `relax.builtin.alloc_tensor` with storage scope. The default storage scope is `"global"`. We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.

MasterJH5574 mentioned this pull request Mar 20, 2024

[Compiler] Support IPC memory and customized all-reduce kernels mlc-ai/mlc-llm#1990

Merged

MasterJH5574 force-pushed the tvm-dev/2024-03-20-lowering-ipc-mem branch from 0af4ac4 to 0211a3a Compare March 21, 2024 00:13

tqchen approved these changes Mar 21, 2024

View reviewed changes

MasterJH5574 force-pushed the tvm-dev/2024-03-20-lowering-ipc-mem branch from 0211a3a to 3b8183f Compare March 21, 2024 04:02

tqchen approved these changes Mar 21, 2024

View reviewed changes

tqchen merged commit 858486f into apache:main Mar 21, 2024
19 checks passed

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759

MasterJH5574 commented Mar 20, 2024

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce #16759

Conversation

MasterJH5574 commented Mar 20, 2024