-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Add Mixtral
] Adds support for the Mixtral MoE
#27942
Merged
Merged
Changes from 1 commit
Commits
Show all changes
111 commits
Select commit
Hold shift + click to select a range
15ef543
up
younesbelkada 3367d25
up
younesbelkada f9da444
test
younesbelkada f59eacc
logits ok
younesbelkada 7e0968a
up
younesbelkada 0bfcd75
up
younesbelkada 6b84e42
few fixes
younesbelkada 2896e2f
conversion script
younesbelkada 92d143f
up
younesbelkada d3261c1
nits
ArthurZucker 407f8a8
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 65bbd30
nits
ArthurZucker 6afc8f3
update
ArthurZucker bfef811
Merge branch 'main' into add-mixtral-alternative
younesbelkada 7a54d1a
nuke
younesbelkada b9f3fc0
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada f8513e8
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 0d31424
more updates
ArthurZucker c8987cb
nites
ArthurZucker d82c8ee
fix many issues
younesbelkada ccc6011
nit
younesbelkada 356d484
scatter
ArthurZucker e858c01
nit
younesbelkada 82037ca
nuke megablocks
younesbelkada e66d1a9
nits
ArthurZucker 49eb7f0
fix conversion script
younesbelkada ffc8463
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 82e4a1b
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 0b1ca52
nit
younesbelkada 3616d3b
remove
ArthurZucker 6d73a58
nits
ArthurZucker 4c1fbf3
nit
younesbelkada a922710
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 1abf6bd
update
ArthurZucker b938a30
oupsssss
ArthurZucker 12ddba9
change
younesbelkada 445e6e6
nits device
ArthurZucker 1e83d0e
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 263310f
nits
ArthurZucker b2bedb1
fixup
ArthurZucker 4afd7e4
update
ArthurZucker c0e6dfd
merge
ArthurZucker dd33a59
add copied from
ArthurZucker 0de7081
fix the copy mentions
ArthurZucker 48945de
update tests
ArthurZucker d927baf
more fixes
ArthurZucker 7402aca
nits
ArthurZucker 54bee10
conversion script
younesbelkada 1ae98dc
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada 8bf257f
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 01b2969
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 80c593e
add parts of the readme
ArthurZucker dab227f
Update tests/models/mixtral/test_modeling_mixtral.py
younesbelkada ca1f7d0
new test + conversion script
younesbelkada e4237a3
Apply suggestions from code review
younesbelkada 0c04bc3
Apply suggestions from code review
younesbelkada bd7c786
fix
younesbelkada b2b8e01
fix copies
younesbelkada badceae
fix copies
younesbelkada 11a4db9
ooops
younesbelkada 419ddb3
fix config
younesbelkada bbbd1b2
Apply suggestions from code review
younesbelkada 2b23c47
fix nits
younesbelkada 76a65e6
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada 18caab8
nit
younesbelkada a00ad3a
add copies
younesbelkada 657dd95
add batched tests
younesbelkada a092648
docs
younesbelkada 67e8e03
fix flash attention
younesbelkada 72542dd
let's add more verbose
younesbelkada d3f5abb
add correct outputs
ArthurZucker e900e36
support router ouptus
ArthurZucker 68b8b41
ignore copies where needed
ArthurZucker ded6028
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 1ceb940
fix
ArthurZucker 38eef46
cat list if list is given for now
ArthurZucker 8d3f83f
nits
ArthurZucker ee5f3e9
Update docs/source/en/model_doc/mixtral.md
younesbelkada e834e89
finish router refactoring
ArthurZucker 1b6358e
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 54f2a48
fix forward
ArthurZucker 19be169
fix expected values
ArthurZucker 5c929df
nits
ArthurZucker 872ee24
fixup
ArthurZucker eaa2a5f
fix
younesbelkada 703672d
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 10760c1
fix bug
younesbelkada 3499c98
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada 19e4aea
fix
ArthurZucker 9bc7d5b
fix dtype mismatch
younesbelkada 290f621
fix
ArthurZucker 1f411a1
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker 9ebf661
grrr grrr I support item assignment
ArthurZucker 23abc46
fix CI
younesbelkada 6549f48
docs
younesbelkada 39e38ed
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker e4b84bc
fixup
ArthurZucker 80d49aa
remove some copied form
ArthurZucker fbde97b
fix weird diff
younesbelkada 20091dc
skip doctest fast on the config and modeling
ArthurZucker adc7113
mark that is supports flash attention in the doc
ArthurZucker c6ddca8
update
ArthurZucker 3f62433
Update src/transformers/models/mixtral/modeling_mixtral.py
ArthurZucker 6c6df4e
Update docs/source/en/model_doc/mixtral.md
ArthurZucker d4e826f
revert router logits config issue
ArthurZucker d17b756
update doc accordingly
ArthurZucker e86facd
Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
younesbelkada 2c85405
nits
ArthurZucker bb88c76
use torch testing asssert close
ArthurZucker 6624e9c
fixup
ArthurZucker c26aaa4
doc nits
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting output_router_logits = True should automatically add the aux_loss