-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HF and DeepSeek-R1-Distill-Llama-70B support #17420
Closed
Closed
+22,475
−11,745
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What's changed Removed all CI tests running TG llama on the old codebase since it's outdated and we are only developing on the new codebase now. Running TG piplelines: https://github.com/tenstorrent/tt-metal/actions/runs/12933778923 ### Checklist - [ ] Post commit CI passes - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
### Ticket Link to Github Issue #16144 ### Problem description - Binary_ng ops support only bfloat16 datatype - binary bitwise ops, rsub, pow, add(int32) are not present in binary_ng ### What's changed - Added float32 support for binary_ng ops - Added bitwise ops - Added add(int32), rsub and pow to binary_ng - Fixed bias_gelu logic ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12796247845 https://github.com/tenstorrent/tt-metal/actions/runs/12834197476 https://github.com/tenstorrent/tt-metal/actions/runs/12922250157 https://github.com/tenstorrent/tt-metal/actions/runs/12948620892 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12805291674 https://github.com/tenstorrent/tt-metal/actions/runs/12834199165 https://github.com/tenstorrent/tt-metal/actions/runs/12914904913 https://github.com/tenstorrent/tt-metal/actions/runs/12948614665 - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests https://github.com/tenstorrent/tt-metal/actions/runs/12842128253 https://github.com/tenstorrent/tt-metal/actions/runs/12916189194 - [x] New/Existing tests provide coverage for changes --------- Co-authored-by: Patrick Roberts <[email protected]>
…6627) ### Ticket - #16626 ### Problem description In the current use case of Matmul1D with gather_in0 in the Llama models, the activations and weights need to be padded. This results in significant overhead. ### What's changed - Added support to skip part of in0_block_w that is padding information - Pad the Kt and Nt in the host code for gather_in0 ### Checklist - [x] Post commit CI passes (https://github.com/tenstorrent/tt-metal/actions/runs/12893880800) - [x] New/Existing tests provide coverage for changes (https://github.com/tenstorrent/tt-metal/actions/runs/12893883783)
…o dispatch s and increase the dispatch s page size to avoid having to split some commands when having multiple sub-devices instead
…torch - Use ttnn.from_device to convert multi-device tensors first - This fixes pref regression for falcon demo tests
### Ticket #17040 ### Problem description Since the ARCH_NAME dependency was removed, there is no longer a reason to have multiple images. ### What's changed - Change the workflows to generate only a single release image. - Update the documentation - Change paths in the Docker registry ### Checklist - [x] Post commit CI passes - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
- Move validation to tensor layout and tensor spec construction * Refactor as helper functions in tensor layout and tensor spec * Tensor layout checks for shard spec + tile spec ** Physical shard shape must be divisible by tile shape if TILE layout * Tensor spec checks for shard spec + tensor shape ** Core grid is valid for number of shards along rows/cols - Update validation for sharding (only call if is_sharded and shard_spec is not None) * Reword asserts to be more descriptive * Remove check on shard shape for row major sharding * Switch to query physical shape and physical shard shape - Add gtests for illegal tensor layout and tensor spec creation * TODO (issue #17060): Flip to TT_FATAL * Rename sharding_with_alignment to more generic file name * Update tests to provide correct shard spec ** Add non-zero grid size for sharding ** Add TensorMemoryLayout matching intended spec #0: Fix incorrect TensorMemoryLayout in test_scaled_dot_product_attention_decode.py
### Ticket [Link to Github Issue](#16954) ### Problem description Most CBs between 1 and 16 are unused. The causes the dispatcher to waste timing initializing many unneeded CBs, so it would be better to pack them starting at 0. ### Checklist - [x] Post commit CI passes - [x] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
…er dealloc issues - Add tests for reading and writing shards with Interleaved and Sharded configs - Add test for deallocation, verying addresses
Noncontiguous CB ranges cause performance problems for the dispatcher, because it initializes all CBs up to the max used index. Warn when programs don't do that.
### Ticket [Link to Github Issue](#16679) ### Problem description TopK currently supports max sorting, where K max values are returned. We need to add necessary changes to LLKs to support returning the K min values. ### What's changed LLKs were updated to pass down a flag specifying which behavior (largest or smallest k values) is expected. Ckernel updated to place min values into register instead of max values when flag is set, returning k min values as a result. ### Checklist - [x] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12932508914) - [x] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/runs/12932523648) (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
…17097) ### Ticket #17095 ### Problem description We don't even compile our repo automatically in PRs. Re require devs to navigate the maze of GH to find the right button to mash. And we can't just auto-run APC because that's crazy long (and heavy on the infra). ### What's changed A new workflow that does a simple build. We'll expand later, but with an eye on robustness and speed.
### Ticket [Link to Github Issue](#16956 (comment)) ### Checklist - [x] Post commit CI passes - [x] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
- Add top level EnqueueWriteMeshBuffer and EnqueueReadMeshBuffer APIs to distributed.hpp
yieldthought
requested review from
skhorasganiTT,
sraizada-tt,
TT-BrianLiu,
cfjchu,
omilyutin-tt,
bbradelTT,
sjameelTT,
SeanNijjar,
jvegaTT,
a team,
ntarafdar,
yugi957,
jaykru-tt,
llongTT,
nardoTT,
yugaoTT,
mo-tenstorrent,
rtawfik01,
ttmtrajkovic,
rdjogoTT,
davorchap,
mywoodstock,
tt-asaigal,
aliuTT,
ubcheema,
aagarwalTT,
abhullar-tt,
pgkeller and
tt-dma
as code owners
January 31, 2025 13:20
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem description
Existing codebase loads the meta checkpoint format but many derivative models are only available on huggingface.
What's changed
Add support for loading HuggingFace model formats, paving the way for full Qwen support (pending yarn rope implementation) and adding DeepSeek-R1-Distill-Llama-70B support.
Checklist