CUDA: add FP32 FlashAttention vector kernel #12040
Job | Run time |
---|---|
26m 22s | |
29m 18s | |
26m 17s | |
25m 19s | |
26m 36s | |
24m 37s | |
10m 49s | |
10m 53s | |
10m 19s | |
7m 26s | |
7m 26s | |
3h 25m 22s |
Job | Run time |
---|---|
26m 22s | |
29m 18s | |
26m 17s | |
25m 19s | |
26m 36s | |
24m 37s | |
10m 49s | |
10m 53s | |
10m 19s | |
7m 26s | |
7m 26s | |
3h 25m 22s |