kernel: use tensor cores for flashinfer gqa kernels #1403

yzh119 · 2024-09-12T06:55:42Z

This pr fixes the kernel dispatch rule for flashinfer backend.

Motivation

When group size > 4, using tensor cores is faster.

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs · 2024-09-12T12:10:39Z

Amazing work!

This reverts commit debbdb5.

upd

0f56138

merrymercy merged commit debbdb5 into sgl-project:main Sep 12, 2024
3 of 9 checks passed

Ying1123 added a commit that referenced this pull request Sep 25, 2024

Revert "kernel: use tensor cores for flashinfer gqa kernels (#1403)"

46ed962

This reverts commit debbdb5.

Ying1123 mentioned this pull request Sep 25, 2024

Revert "kernel: use tensor cores for flashinfer gqa kernels" #1511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: use tensor cores for flashinfer gqa kernels #1403

kernel: use tensor cores for flashinfer gqa kernels #1403

yzh119 commented Sep 12, 2024

zhyncs commented Sep 12, 2024

kernel: use tensor cores for flashinfer gqa kernels #1403

kernel: use tensor cores for flashinfer gqa kernels #1403

Conversation

yzh119 commented Sep 12, 2024

Motivation

Checklist

zhyncs commented Sep 12, 2024