-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed]Exposure softmax_lse & seed_offset in FlashAttention #56066
Merged
ForFishes
merged 1 commit into
PaddlePaddle:incubate/new_frl
from
ForFishes:fix_flash_attn
Aug 9, 2023
Merged
[Distributed]Exposure softmax_lse & seed_offset in FlashAttention #56066
ForFishes
merged 1 commit into
PaddlePaddle:incubate/new_frl
from
ForFishes:fix_flash_attn
Aug 9, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
d875732
to
2a07ff5
Compare
2a07ff5
to
9f9e3d2
Compare
sneaxiy
approved these changes
Aug 9, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Oct 17, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Oct 23, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Oct 26, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 7, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 8, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 9, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 14, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 25, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 28, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 28, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 28, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Nov 29, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 4, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 4, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 4, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 4, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 4, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
hitywt
pushed a commit
to hitywt/Paddle
that referenced
this pull request
Dec 5, 2023
zhiqiu
pushed a commit
that referenced
this pull request
Dec 6, 2023
* part-3 cherry from: add check for cembedding (#55621) * part-3 fix cherry from: add check for cembedding * part-3 fix c_embedding * fix test_gpt_with_pir caused by pir * part-3 cherry from: [Distributed] Support dp/sharding overlap in virtual pp (#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log * part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015) * [FlashAttn] add flash randomness control (#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <[email protected]> * part-4 cherry from: fix codestyle (#56066) * part-4 cherry from(no change): Add assert for static and other plateform (#56044) * part-4 cherry-pick from: dp and sharding coexist (#56096) * dp and sharding coexist * dp * part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441) * add debug information * fix log * fix log * add detach for pp * part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451) * fix bug in synchronize * fix bug in synchronize * part-4 cherry from: add fused gradient (#57048) * part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517) * add eager_nccl_connection * add eager_connection * add eager_connection * part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625) * fix h2d bandwidth * remove useless flags * fix cherrry pick #56066 * part-5 cherry from: Add allocation debug FLAGS (#57797) * Add allocation debug FLAGS * add sync after value set * refine flags * part-5 cherry from: fix softmax backward (#57971) * part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp * fix * fix comments * fix kunlun compatibility issues * fix test_fused_rotary_position_embedding.py * fix allocator.h * tinyfix * fix conflicts * fix new ir translator c_embedding failure --------- Co-authored-by: ShenLiang <[email protected]> Co-authored-by: umiswing <[email protected]> Co-authored-by: Chitsing KUI <[email protected]> Co-authored-by: niuliling123 <[email protected]> Co-authored-by: liuzhenhai93 <[email protected]> Co-authored-by: sneaxiy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Others
Description
[Distributed]Exposure softmax_lse & seed_offset in FlashAttention