-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MeshBuffer integration in TTNN #17215
Labels
Comments
This was referenced Jan 28, 2025
Closed
omilyutin-tt
added a commit
that referenced
this issue
Jan 29, 2025
### Ticket #17215 ### Problem description Explicit deallocation at the `MeshBuffer` is required because TTNN allows users to explicitly deallocate tensors that are not in use any more. Setting tensors to `None` or using `del` is one option forward, but this is a larger effort that requires refactoring hundreds of files. ### What's changed Added `is_allocated` and `deallocate` methods, modified the corresponding test. Minor changes to documentation and code style. No functional changes ### Checklist - [ ] [All post commit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085) - CI seems to be entirely broken. Ran `distributed_unit_tests` locally - these are the only ones that can potentially affect `MeshBuffer`. - [X] New/Existing tests provide coverage for changes - ran the `MeshBuffer.Deallocation` test locally.
omilyutin-tt
added a commit
that referenced
this issue
Jan 30, 2025
### Ticket #17215 ### Problem description See #17215 ### What's changed Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally created to back the individual per-device shards. This allows to incrementally switch over to `MeshBuffer` backed variant, while not breaking any of the existing ops. Long term plan for tensor storage: * `MeshBuffer` backed `MultiDeviceStorage` will become the default in TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the `DeviceStorage` being removed. * Interactions with `MeshBuffer` will be entirely synchronous and will be done on the main thread. This allows to get rid of any of the async code in `Tensor`. Next steps in terms of integrating with `MeshBuffer`: - [X] Implement explicit dealloc routine for `MeshBuffer` (done in #17265 and integrated here). - [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN perspective, interacting with these APIs will be entirely synchronous. - [ ] Use read / write shards APIs when writing data to `MeshBuffer` backed `MultiDeviceStorage`. - [ ] When launching multi-device operations, create a `MeshBuffer` backed `MultiDeviceStorage` first, then supply the individual shards into ops. This way allows to perform allocation in lock-step across mesh, while maintaining compatibility with the existing ops infra. Note this will change with the introduction of `MeshWorkload`, and this will require further exploration. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827) - [X] [T3K unit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228) - [X] New/Existing tests provide coverage for changes
williamlyTT
pushed a commit
that referenced
this issue
Jan 30, 2025
### Ticket #17215 ### Problem description Explicit deallocation at the `MeshBuffer` is required because TTNN allows users to explicitly deallocate tensors that are not in use any more. Setting tensors to `None` or using `del` is one option forward, but this is a larger effort that requires refactoring hundreds of files. ### What's changed Added `is_allocated` and `deallocate` methods, modified the corresponding test. Minor changes to documentation and code style. No functional changes ### Checklist - [ ] [All post commit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085) - CI seems to be entirely broken. Ran `distributed_unit_tests` locally - these are the only ones that can potentially affect `MeshBuffer`. - [X] New/Existing tests provide coverage for changes - ran the `MeshBuffer.Deallocation` test locally.
williamlyTT
pushed a commit
that referenced
this issue
Jan 30, 2025
### Ticket #17215 ### Problem description See #17215 ### What's changed Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally created to back the individual per-device shards. This allows to incrementally switch over to `MeshBuffer` backed variant, while not breaking any of the existing ops. Long term plan for tensor storage: * `MeshBuffer` backed `MultiDeviceStorage` will become the default in TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the `DeviceStorage` being removed. * Interactions with `MeshBuffer` will be entirely synchronous and will be done on the main thread. This allows to get rid of any of the async code in `Tensor`. Next steps in terms of integrating with `MeshBuffer`: - [X] Implement explicit dealloc routine for `MeshBuffer` (done in #17265 and integrated here). - [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN perspective, interacting with these APIs will be entirely synchronous. - [ ] Use read / write shards APIs when writing data to `MeshBuffer` backed `MultiDeviceStorage`. - [ ] When launching multi-device operations, create a `MeshBuffer` backed `MultiDeviceStorage` first, then supply the individual shards into ops. This way allows to perform allocation in lock-step across mesh, while maintaining compatibility with the existing ops infra. Note this will change with the introduction of `MeshWorkload`, and this will require further exploration. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827) - [X] [T3K unit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228) - [X] New/Existing tests provide coverage for changes
yieldthought
pushed a commit
that referenced
this issue
Jan 31, 2025
### Ticket #17215 ### Problem description Explicit deallocation at the `MeshBuffer` is required because TTNN allows users to explicitly deallocate tensors that are not in use any more. Setting tensors to `None` or using `del` is one option forward, but this is a larger effort that requires refactoring hundreds of files. ### What's changed Added `is_allocated` and `deallocate` methods, modified the corresponding test. Minor changes to documentation and code style. No functional changes ### Checklist - [ ] [All post commit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085) - CI seems to be entirely broken. Ran `distributed_unit_tests` locally - these are the only ones that can potentially affect `MeshBuffer`. - [X] New/Existing tests provide coverage for changes - ran the `MeshBuffer.Deallocation` test locally.
yieldthought
pushed a commit
that referenced
this issue
Jan 31, 2025
### Ticket #17215 ### Problem description See #17215 ### What's changed Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally created to back the individual per-device shards. This allows to incrementally switch over to `MeshBuffer` backed variant, while not breaking any of the existing ops. Long term plan for tensor storage: * `MeshBuffer` backed `MultiDeviceStorage` will become the default in TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the `DeviceStorage` being removed. * Interactions with `MeshBuffer` will be entirely synchronous and will be done on the main thread. This allows to get rid of any of the async code in `Tensor`. Next steps in terms of integrating with `MeshBuffer`: - [X] Implement explicit dealloc routine for `MeshBuffer` (done in #17265 and integrated here). - [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN perspective, interacting with these APIs will be entirely synchronous. - [ ] Use read / write shards APIs when writing data to `MeshBuffer` backed `MultiDeviceStorage`. - [ ] When launching multi-device operations, create a `MeshBuffer` backed `MultiDeviceStorage` first, then supply the individual shards into ops. This way allows to perform allocation in lock-step across mesh, while maintaining compatibility with the existing ops infra. Note this will change with the introduction of `MeshWorkload`, and this will require further exploration. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827) - [X] [T3K unit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228) - [X] New/Existing tests provide coverage for changes
nikileshx
pushed a commit
to nikileshx/tt-metal
that referenced
this issue
Feb 3, 2025
…ent#17259) ### Ticket tenstorrent#17215 ### Problem description See tenstorrent#17215 ### What's changed Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally created to back the individual per-device shards. This allows to incrementally switch over to `MeshBuffer` backed variant, while not breaking any of the existing ops. Long term plan for tensor storage: * `MeshBuffer` backed `MultiDeviceStorage` will become the default in TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the `DeviceStorage` being removed. * Interactions with `MeshBuffer` will be entirely synchronous and will be done on the main thread. This allows to get rid of any of the async code in `Tensor`. Next steps in terms of integrating with `MeshBuffer`: - [X] Implement explicit dealloc routine for `MeshBuffer` (done in tenstorrent#17265 and integrated here). - [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN perspective, interacting with these APIs will be entirely synchronous. - [ ] Use read / write shards APIs when writing data to `MeshBuffer` backed `MultiDeviceStorage`. - [ ] When launching multi-device operations, create a `MeshBuffer` backed `MultiDeviceStorage` first, then supply the individual shards into ops. This way allows to perform allocation in lock-step across mesh, while maintaining compatibility with the existing ops infra. Note this will change with the introduction of `MeshWorkload`, and this will require further exploration. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827) - [X] [T3K unit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228) - [X] New/Existing tests provide coverage for changes
3 tasks
omilyutin-tt
added a commit
that referenced
this issue
Feb 7, 2025
…#17513) ### Ticket #17215 ### Problem description Tensors allocated on mesh buffer (aka "mesh tensors") need write and read APIs exposed to TTNN. ### What's changed * Extended mesh CQ interface to read / write shards, to accommodate TTNN multi-device sharding APIs. * The future work includes parallelizing the per-device dispatches internally, within Metal. * Add `to_device_mesh_tensor` and `to_host_mesh_tensor` that will be the main API used in TTNN to read/write the mesh buffer tensors. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/13167605541) - pending - [X] [T3K unit tests](https://github.com/tenstorrent/tt-metal/actions/runs/13167605541) - [X] New/Existing tests provide coverage for changes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As part of the TT-Distributed effort,
MeshBuffer
will be integrated in TTNN to abstract away allocations made across the entire mesh of devices.The integration work includes:
MeshBuffer
backed storage as one of the tensor storage variants.MeshBuffer
backed storage for all of initializations of storage across a mesh of devices.Follow up work will include refactoring tensor storage variants for unification of single- and multi- device code path.
The text was updated successfully, but these errors were encountered: