Partial Fix for Sliding Window Attention #994

cdoko · 2024-12-16T08:14:23Z

Currently, in models like Phi3 and Mistral, with paged attention disabled, sliding window attention fails when the input sequence length exceeds the sliding window size, due to a "start > dim_len" error caused by the mask tensor being narrowed using its original length.

An initial fix was updating the mask length after the first narrow operation, but this brings up another error where the mask's dimension was one higher than the corresponding dimension in the key and value tensors, leading to a shape mismatch in the naive_sdpa function.

The issue was resolved by removing the concatenation of the mask tensor, resulting in coherent outputs for non-flash attention, although flash attention remains non-functional.

Any feedback or review of this change would be greatly appreciated!

github-actions · 2024-12-16T08:15:27Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 63         2706         2338           71          297
 Shell                   1           57           22           18           17
 Plain Text              3         3723            0         2413         1310
 TOML                   18          602          535            2           65
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               43         3324            0         2520          804
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                12          406          344            0           62
 |- TOML                 2           75           63            0           12
 (Total)                           4039          626         2520          893
-------------------------------------------------------------------------------
 Rust                  288        87673        78724         1802         7147
 |- Markdown           138         1526           25         1387          114
 (Total)                          89199        78749         3189         7261
===============================================================================
 Total                 437        98287        81792         6838         9657
===============================================================================

EricLBuehler

Thank you!

fix mismatched dims

e8b87bb

EricLBuehler approved these changes Dec 19, 2024

View reviewed changes

EricLBuehler merged commit c582196 into EricLBuehler:master Dec 19, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Fix for Sliding Window Attention #994

Partial Fix for Sliding Window Attention #994

cdoko commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

EricLBuehler left a comment

Partial Fix for Sliding Window Attention #994

Partial Fix for Sliding Window Attention #994

Conversation

cdoko commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

EricLBuehler left a comment

Choose a reason for hiding this comment