Second matmul for fully custom attention #227

ngc92 · 2024-04-22T23:08:30Z

So far, just in the /dev files, because for the main script we also need to touch backward.
For some reason, I see considerable speed-up in the benchmarks here, but in my attempts to use this in the main model, this hasn't really translated.

FeSens · 2024-04-24T20:08:54Z

What is the speed of the matmul_tri compared with cublas?

ngc92 · 2024-04-24T20:15:16Z

On my A4000, cublas (no Tensorcores) is getting reported at 52% of the FP32 capacity, whereas this kernel gets 33%. So it is overall slower, but as it calculates only half, it still wins out. That changes with tensorcores, though.

I think its the writing back of results that still is quite bad here.

ngc92 · 2024-04-27T11:20:28Z

Some more optimizations, and now its slightly faster than the tensorcore counterparts. With getting rid of the permutes, this yields a substantial net speedup for the attention kernel.
Unfortunately, we cannot yet use this in the main model, because the backward still assumes the permutations.

ngc92 force-pushed the matmuls branch from bf3e418 to c69ded3 Compare April 26, 2024 15:29

ngc92 added 10 commits April 27, 2024 11:58

added the second attention matmul as a dedicated kernel

7dc523f

new attention kernel using custom matmul

abc7096

improved matmul kernel by ensuring conflict-free shared memory access

dd0d0d6

reorganized results within thread for coalesced writing

4786479

vectorized loading

fde3371

update attention file

3300e33

readability improvements and enabled tensor cores

eb49d8b

shared memory kernel version

71106bc

merge into attention forward kernel

b9f441d

enable tensor cores for attention testing

8ec9abc

ngc92 force-pushed the matmuls branch from c69ded3 to 8ec9abc Compare April 27, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second matmul for fully custom attention #227

Second matmul for fully custom attention #227

ngc92 commented Apr 22, 2024

FeSens commented Apr 24, 2024

ngc92 commented Apr 24, 2024

ngc92 commented Apr 27, 2024

Second matmul for fully custom attention #227

Are you sure you want to change the base?

Second matmul for fully custom attention #227

Conversation

ngc92 commented Apr 22, 2024

FeSens commented Apr 24, 2024

ngc92 commented Apr 24, 2024

ngc92 commented Apr 27, 2024