-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add perf reporting for ccl async mode #16658
Conversation
223273e
to
a284b8f
Compare
a284b8f
to
52bf222
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Aswin! Minor request for additional specific test cases
"num_devices, num_links, output_shape, dim, layout", | ||
[ | ||
(4, 1, [1, 1, 64, 512], 3, ttnn.TILE_LAYOUT), | ||
# (4, 1, [1, 1, 32, 32768], 3, ttnn.TILE_LAYOUT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the following shapes/configurations (each with bfp8 and fp16)?
4-chip, in_shape :[1,1,32,1280], dim=0
4-chip, in_shape:[1,1,32,7168], dim=0
8-chip, in_shape:[1,1,32,2048], dim=0
4-chip, in_shape:[1,1,32,3584], dim=0
4-chip, in_shape:[1,1,32,32],dim=0
4-chip, in_shape:[1,1,8,32],dim=2 // This one may not work yet with the new all-gather. If it doesn't we can add a separate issue to add the support for padded tiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you please also add equivalent tests for TG? There are some example tests of the cluster axis API usage in this test file.
2ccff5a
to
062b5c1
Compare
062b5c1
to
20de603
Compare
@@ -168,7 +168,7 @@ def run_all_gather_impl( | |||
|
|||
# create global semaphore handles | |||
ccl_semaphore_handles = create_global_semaphore_with_same_address(mesh_device, ccl_sub_device_crs, 0) | |||
|
|||
output_shape[dim] *= num_devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should get moved to the caller where your new tests are - otherwise you'll need to update every other test to start passing input shape instead.
20de603
to
7e1705f
Compare
(8, 1, [1, 1, 32, 2048], 0, ttnn.TILE_LAYOUT), | ||
(4, 1, [1, 1, 32, 3584], 0, ttnn.TILE_LAYOUT), | ||
(4, 1, [1, 1, 32, 32], 0, ttnn.TILE_LAYOUT), | ||
# (4, 1, [1, 1, 8, 32], 2, ttnn.TILE_LAYOUT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove commented out cases
7e1705f
to
d51707d
Compare
### Ticket #16648 ### Problem description Need perf reporting for async all gather ### What's changed <img width="1292" alt="Screenshot 2025-01-24 at 8 42 29 PM" src="https://github.com/user-attachments/assets/33c64cf1-9f31-4567-89ed-813acc16e49d" /> ### Checklist - [ ] Post commit CI passes - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
### Ticket #16648 ### Problem description Need perf reporting for async all gather ### What's changed <img width="1292" alt="Screenshot 2025-01-24 at 8 42 29 PM" src="https://github.com/user-attachments/assets/33c64cf1-9f31-4567-89ed-813acc16e49d" /> ### Checklist - [ ] Post commit CI passes - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
### Ticket tenstorrent#16648 ### Problem description Need perf reporting for async all gather ### What's changed <img width="1292" alt="Screenshot 2025-01-24 at 8 42 29 PM" src="https://github.com/user-attachments/assets/33c64cf1-9f31-4567-89ed-813acc16e49d" /> ### Checklist - [ ] Post commit CI passes - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
Ticket
#16648
Problem description
Need perf reporting for async all gather
What's changed
Checklist