Use flash for dropout != 0 #194

mikekgfb · 2023-03-05T17:44:20Z

PyTorch 2.0 (nightlies) now supports dropout > 0.0, so the check for dropout can be removed.
(And even if on back-level PyTorch builds, we can handle dropout > 0.0 where it will just fall back to sdpa_math() which is pretty much the same implementation based on the SDPA math that nanogpt uses today when flash is not available.)

Constraints to dispatch to Flash Attention are defined in use_flash_attention
https://github.com/pytorch/pytorch/blob/4973ca5e3e2c07311a879f49ac8983b7cae81a2d/aten/src/ATen/native/transformers/cuda/sdp_utils.h#L576
as follows:
https://github.com/pytorch/pytorch/blob/4973ca5e3e2c07311a879f49ac8983b7cae81a2d/aten/src/ATen/native/transformers/cuda/sdp_utils.h#L584-L592

mikekgfb · 2023-03-07T13:47:20Z

Fixed by #195

golden gen for decoder

mikekgfb closed this as completed Mar 7, 2023

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Jul 27, 2024

Merge pull request karpathy#194 from Buck008/golden_gen

21d6ee4

golden gen for decoder

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Sep 5, 2024

Merge pull request karpathy#194 from Buck008/golden_gen

ed1a9b5

golden gen for decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use flash for dropout != 0 #194

Use flash for dropout != 0 #194

mikekgfb commented Mar 5, 2023 •

edited

Loading

mikekgfb commented Mar 7, 2023

Use flash for dropout != 0 #194

Use flash for dropout != 0 #194

Comments

mikekgfb commented Mar 5, 2023 • edited Loading

mikekgfb commented Mar 7, 2023

mikekgfb commented Mar 5, 2023 •

edited

Loading