Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeshBuffer integration in TTNN #17215

Open
omilyutin-tt opened this issue Jan 28, 2025 · 0 comments
Open

MeshBuffer integration in TTNN #17215

omilyutin-tt opened this issue Jan 28, 2025 · 0 comments
Assignees
Labels

Comments

@omilyutin-tt
Copy link
Contributor

As part of the TT-Distributed effort, MeshBuffer will be integrated in TTNN to abstract away allocations made across the entire mesh of devices.

The integration work includes:

  1. Add MeshBuffer backed storage as one of the tensor storage variants.
  2. Extend the processing logic to handle this new case accordingly. E.g. allocations/reads/writes should go through the corresponding mesh APIs.
  3. Switch to the MeshBuffer backed storage for all of initializations of storage across a mesh of devices.

Follow up work will include refactoring tensor storage variants for unification of single- and multi- device code path.

@omilyutin-tt omilyutin-tt self-assigned this Jan 28, 2025
omilyutin-tt added a commit that referenced this issue Jan 29, 2025
### Ticket
#17215

### Problem description
Explicit deallocation at the `MeshBuffer` is required because TTNN
allows users to explicitly deallocate tensors that are not in use any
more. Setting tensors to `None` or using `del` is one option forward,
but this is a larger effort that requires refactoring hundreds of files.

### What's changed
Added `is_allocated` and `deallocate` methods, modified the
corresponding test.

Minor changes to documentation and code style. No functional changes

### Checklist
- [ ] [All post commit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085)
- CI seems to be entirely broken. Ran `distributed_unit_tests` locally -
these are the only ones that can potentially affect `MeshBuffer`.
- [X] New/Existing tests provide coverage for changes - ran the
`MeshBuffer.Deallocation` test locally.
omilyutin-tt added a commit that referenced this issue Jan 30, 2025
### Ticket
#17215 

### Problem description
See #17215

### What's changed
Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally
created to back the individual per-device shards. This allows to
incrementally switch over to `MeshBuffer` backed variant, while not
breaking any of the existing ops.

Long term plan for tensor storage:
* `MeshBuffer` backed `MultiDeviceStorage` will become the default in
TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the
`DeviceStorage` being removed.
* Interactions with `MeshBuffer` will be entirely synchronous and will
be done on the main thread. This allows to get rid of any of the async
code in `Tensor`.

Next steps in terms of integrating with `MeshBuffer`:
- [X] Implement explicit dealloc routine for `MeshBuffer` (done in
#17265 and integrated here).
- [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN
perspective, interacting with these APIs will be entirely synchronous.
- [ ] Use read / write shards APIs when writing data to `MeshBuffer`
backed `MultiDeviceStorage`.
- [ ] When launching multi-device operations, create a `MeshBuffer`
backed `MultiDeviceStorage` first, then supply the individual shards
into ops. This way allows to perform allocation in lock-step across
mesh, while maintaining compatibility with the existing ops infra. Note
this will change with the introduction of `MeshWorkload`, and this will
require further exploration.

### Checklist
- [X] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827)
- [X] [T3K unit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228)
- [X] New/Existing tests provide coverage for changes
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
### Ticket
#17215

### Problem description
Explicit deallocation at the `MeshBuffer` is required because TTNN
allows users to explicitly deallocate tensors that are not in use any
more. Setting tensors to `None` or using `del` is one option forward,
but this is a larger effort that requires refactoring hundreds of files.

### What's changed
Added `is_allocated` and `deallocate` methods, modified the
corresponding test.

Minor changes to documentation and code style. No functional changes

### Checklist
- [ ] [All post commit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085)
- CI seems to be entirely broken. Ran `distributed_unit_tests` locally -
these are the only ones that can potentially affect `MeshBuffer`.
- [X] New/Existing tests provide coverage for changes - ran the
`MeshBuffer.Deallocation` test locally.
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
### Ticket
#17215 

### Problem description
See #17215

### What's changed
Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally
created to back the individual per-device shards. This allows to
incrementally switch over to `MeshBuffer` backed variant, while not
breaking any of the existing ops.

Long term plan for tensor storage:
* `MeshBuffer` backed `MultiDeviceStorage` will become the default in
TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the
`DeviceStorage` being removed.
* Interactions with `MeshBuffer` will be entirely synchronous and will
be done on the main thread. This allows to get rid of any of the async
code in `Tensor`.

Next steps in terms of integrating with `MeshBuffer`:
- [X] Implement explicit dealloc routine for `MeshBuffer` (done in
#17265 and integrated here).
- [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN
perspective, interacting with these APIs will be entirely synchronous.
- [ ] Use read / write shards APIs when writing data to `MeshBuffer`
backed `MultiDeviceStorage`.
- [ ] When launching multi-device operations, create a `MeshBuffer`
backed `MultiDeviceStorage` first, then supply the individual shards
into ops. This way allows to perform allocation in lock-step across
mesh, while maintaining compatibility with the existing ops infra. Note
this will change with the introduction of `MeshWorkload`, and this will
require further exploration.

### Checklist
- [X] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827)
- [X] [T3K unit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228)
- [X] New/Existing tests provide coverage for changes
yieldthought pushed a commit that referenced this issue Jan 31, 2025
### Ticket
#17215

### Problem description
Explicit deallocation at the `MeshBuffer` is required because TTNN
allows users to explicitly deallocate tensors that are not in use any
more. Setting tensors to `None` or using `del` is one option forward,
but this is a larger effort that requires refactoring hundreds of files.

### What's changed
Added `is_allocated` and `deallocate` methods, modified the
corresponding test.

Minor changes to documentation and code style. No functional changes

### Checklist
- [ ] [All post commit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13020708085)
- CI seems to be entirely broken. Ran `distributed_unit_tests` locally -
these are the only ones that can potentially affect `MeshBuffer`.
- [X] New/Existing tests provide coverage for changes - ran the
`MeshBuffer.Deallocation` test locally.
yieldthought pushed a commit that referenced this issue Jan 31, 2025
### Ticket
#17215 

### Problem description
See #17215

### What's changed
Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally
created to back the individual per-device shards. This allows to
incrementally switch over to `MeshBuffer` backed variant, while not
breaking any of the existing ops.

Long term plan for tensor storage:
* `MeshBuffer` backed `MultiDeviceStorage` will become the default in
TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the
`DeviceStorage` being removed.
* Interactions with `MeshBuffer` will be entirely synchronous and will
be done on the main thread. This allows to get rid of any of the async
code in `Tensor`.

Next steps in terms of integrating with `MeshBuffer`:
- [X] Implement explicit dealloc routine for `MeshBuffer` (done in
#17265 and integrated here).
- [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN
perspective, interacting with these APIs will be entirely synchronous.
- [ ] Use read / write shards APIs when writing data to `MeshBuffer`
backed `MultiDeviceStorage`.
- [ ] When launching multi-device operations, create a `MeshBuffer`
backed `MultiDeviceStorage` first, then supply the individual shards
into ops. This way allows to perform allocation in lock-step across
mesh, while maintaining compatibility with the existing ops infra. Note
this will change with the introduction of `MeshWorkload`, and this will
require further exploration.

### Checklist
- [X] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827)
- [X] [T3K unit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228)
- [X] New/Existing tests provide coverage for changes
nikileshx pushed a commit to nikileshx/tt-metal that referenced this issue Feb 3, 2025
…ent#17259)

### Ticket
tenstorrent#17215 

### Problem description
See tenstorrent#17215

### What's changed
Extend `MultiDeviceStorage` with `MeshBuffer`, which is optionally
created to back the individual per-device shards. This allows to
incrementally switch over to `MeshBuffer` backed variant, while not
breaking any of the existing ops.

Long term plan for tensor storage:
* `MeshBuffer` backed `MultiDeviceStorage` will become the default in
TTNN. It will eventually be renamed to `MeshDeviceStorage`, with the
`DeviceStorage` being removed.
* Interactions with `MeshBuffer` will be entirely synchronous and will
be done on the main thread. This allows to get rid of any of the async
code in `Tensor`.

Next steps in terms of integrating with `MeshBuffer`:
- [X] Implement explicit dealloc routine for `MeshBuffer` (done in
tenstorrent#17265 and integrated here).
- [ ] Implement read / write shards APIs for `MeshBuffer`. From the TTNN
perspective, interacting with these APIs will be entirely synchronous.
- [ ] Use read / write shards APIs when writing data to `MeshBuffer`
backed `MultiDeviceStorage`.
- [ ] When launching multi-device operations, create a `MeshBuffer`
backed `MultiDeviceStorage` first, then supply the individual shards
into ops. This way allows to perform allocation in lock-step across
mesh, while maintaining compatibility with the existing ops infra. Note
this will change with the introduction of `MeshWorkload`, and this will
require further exploration.

### Checklist
- [X] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13036677827)
- [X] [T3K unit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13036686228)
- [X] New/Existing tests provide coverage for changes
omilyutin-tt added a commit that referenced this issue Feb 7, 2025
…#17513)

### Ticket
#17215

### Problem description
Tensors allocated on mesh buffer (aka "mesh tensors") need write and
read APIs exposed to TTNN.

### What's changed
* Extended mesh CQ interface to read / write shards, to accommodate TTNN
multi-device sharding APIs.
* The future work includes parallelizing the per-device dispatches
internally, within Metal.
* Add `to_device_mesh_tensor` and `to_host_mesh_tensor` that will be the
main API used in TTNN to read/write the mesh buffer tensors.

### Checklist
- [X] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13167605541)
- pending
- [X] [T3K unit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/13167605541)
- [X] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant