Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flash for dropout != 0 #194

Closed
mikekgfb opened this issue Mar 5, 2023 · 1 comment
Closed

Use flash for dropout != 0 #194

mikekgfb opened this issue Mar 5, 2023 · 1 comment

Comments

@mikekgfb
Copy link

mikekgfb commented Mar 5, 2023

PyTorch 2.0 (nightlies) now supports dropout > 0.0, so the check for dropout can be removed.
(And even if on back-level PyTorch builds, we can handle dropout > 0.0 where it will just fall back to sdpa_math() which is pretty much the same implementation based on the SDPA math that nanogpt uses today when flash is not available.)

Constraints to dispatch to Flash Attention are defined in use_flash_attention
https://github.com/pytorch/pytorch/blob/4973ca5e3e2c07311a879f49ac8983b7cae81a2d/aten/src/ATen/native/transformers/cuda/sdp_utils.h#L576
as follows:
https://github.com/pytorch/pytorch/blob/4973ca5e3e2c07311a879f49ac8983b7cae81a2d/aten/src/ATen/native/transformers/cuda/sdp_utils.h#L584-L592

@mikekgfb
Copy link
Author

mikekgfb commented Mar 7, 2023

Fixed by #195

@mikekgfb mikekgfb closed this as completed Mar 7, 2023
gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Jul 27, 2024
gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant