Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Fix for Sliding Window Attention #994

Merged

Conversation

cdoko
Copy link
Collaborator

@cdoko cdoko commented Dec 16, 2024

Currently, in models like Phi3 and Mistral, with paged attention disabled, sliding window attention fails when the input sequence length exceeds the sliding window size, due to a "start > dim_len" error caused by the mask tensor being narrowed using its original length.

An initial fix was updating the mask length after the first narrow operation, but this brings up another error where the mask's dimension was one higher than the corresponding dimension in the key and value tensors, leading to a shape mismatch in the naive_sdpa function.

The issue was resolved by removing the concatenation of the mask tensor, resulting in coherent outputs for non-flash attention, although flash attention remains non-functional.

Any feedback or review of this change would be greatly appreciated!

Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 63         2706         2338           71          297
 Shell                   1           57           22           18           17
 Plain Text              3         3723            0         2413         1310
 TOML                   18          602          535            2           65
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               43         3324            0         2520          804
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                12          406          344            0           62
 |- TOML                 2           75           63            0           12
 (Total)                           4039          626         2520          893
-------------------------------------------------------------------------------
 Rust                  288        87673        78724         1802         7147
 |- Markdown           138         1526           25         1387          114
 (Total)                          89199        78749         3189         7261
===============================================================================
 Total                 437        98287        81792         6838         9657
===============================================================================
  

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@EricLBuehler EricLBuehler merged commit c582196 into EricLBuehler:master Dec 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants