Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] rafactor: make gqa_group_size a function argument instead of template parameter #262

Closed
wants to merge 1 commit into from

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented May 27, 2024

Our previous implementation treats gqa_group_size as a template parameter.
We also have an implicit assumption that 8 should be divisible by gqa_group_size, this is not documented and cause lots of confusion for developers.

#223 could be a workaround but this would quickly increase the binary size of this library if we keep adding more and more gqa group sizes.

This PR refactors the code so that gqa_group_size becomes a runtime function argument, and we don't need to compile one kernel for each of them. This would be bring very slight performance degradation to gqa_group_size=1 but should work good in general. This PR also removes the requirement that "8 should be divisible by gqa_group_size", so that any gqa group size should be supported.

This PR should resolve #142 #181 #246 #254 #258 and so on.

Still working in progress, estimated finish time: May 27th at noon (PST time). May 30th June 3rd.

@yzh119 yzh119 force-pushed the group-size-parameter branch 2 times, most recently from 34b0475 to c4cff52 Compare May 30, 2024 22:04
yzh119 added a commit that referenced this pull request Jun 2, 2024
#262 will degrade performance because divide by group_size will be slow.

This PR adds fastdiv support so that we can use faster shift and ifma
operations to compute division.

We copied the code from figure 10-2 in [Hacker's
Delight](https://www.oreilly.com/library/view/hackers-delight-second/9780133084993/)
version 2. The data structure was inspired by
https://github.com/milakov/int_fastdiv/ project, we kept its license.
@yzh119 yzh119 force-pushed the group-size-parameter branch from 31b5e4d to 2d1b7d0 Compare June 3, 2024 08:55
@yzh119 yzh119 force-pushed the group-size-parameter branch 3 times, most recently from 5e33542 to 4ee6a5b Compare June 10, 2024 09:53
This reverts commit 3b3ce05.

wip

rebase

wip

wip

upd

upd

remove macros that are no longer used

upd

upd

upd

bugfix

太阳,请赐予我完成它的力量吧

fix

fix

remove redundant code

remove conflict
@yzh119
Copy link
Collaborator Author

yzh119 commented Jun 14, 2024

Follow up in #301

@yzh119 yzh119 closed this Jun 14, 2024
yzh119 added a commit that referenced this pull request Jun 15, 2024
…ments (#301)

#262 is out of sync with main, this PR rebased the code on main branch.
@yzh119 yzh119 mentioned this pull request Jul 10, 2024
yzh119 added a commit that referenced this pull request Jul 10, 2024
Alibi experienced a performance degradation after #262 because of
increased number of integer division.
This PR fixes the issue.
@yzh119 yzh119 deleted the group-size-parameter branch August 11, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Versatile head dimension
1 participant