-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ #7386
Merged
tlrmchlsmth
merged 59 commits into
vllm-project:main
from
rasmith:ransmith_awq_gemm_triton
Aug 28, 2024
Merged
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ #7386
Changes from 19 commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
ff27ffa
Add awq_dequantize_triton
rasmith f9b6e74
Add awq_dequantize_triton
rasmith 7b49a76
Merge branch 'ransmith_awq_dequantize_triton' of github.com:rasmith/v…
rasmith e2c3ba5
Merge branch 'vllm-project:main' into ransmith_awq_dequantize_triton
rasmith ec14fe9
Use any instead of all
rasmith fd80f7f
ruff checks
rasmith 370c9f0
run isort
rasmith bdd0ab7
run yapf
rasmith 915e0ae
Format for PR
rasmith 3b3a563
Merge branch 'ransmith_awq_dequantize_triton' of github.com:rasmith/v…
rasmith 150db8c
Merge branch 'vllm-project:main' into ransmith_awq_dequantize_triton
rasmith 00dee49
Merge branch 'main' into ransmith_awq_dequantize_triton
rasmith a8ef8c2
Merge branch 'ransmith_awq_dequantize_triton' of github.com:rasmith/v…
rasmith 2ebd212
Have working awq_gemm in Triton
rasmith e3073bc
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith 5326dde
Optimizations to awq_gemm
rasmith fb43aa4
Small cleanup
rasmith 43abe7a
ruff and yapf linting/formatting
rasmith 91c6741
isort/ruff fixing
rasmith 962ea59
add env VLLM_USE_TRITON_AWQ
rasmith c9df260
Add tests
rasmith c7b63e8
awq for rocm in config
rasmith 5cf14db
add dimension assertions
rasmith 23cf001
fix typo
rasmith f94c1b0
yappity yapf
rasmith 5887e77
merge main
rasmith 8594e25
Merge branch 'vllm-project:main' into ransmith_awq_gemm_triton
rasmith 64e5251
Merge main
rasmith 86f2ec6
warning message for AWQ on ROCm and not setting VLLM_USE_TRITON_AWQ
rasmith d32212a
VLLM_USE_TRITON_AWQ enabled automatically
rasmith 6514622
parameterized unit tests
rasmith 8a1f6f2
cleanup
rasmith 39d44a2
ruff
rasmith 34e06b5
yapf
rasmith 4f3148f
yapf
rasmith 010c80e
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith 4895074
test cleanup
rasmith 0e1862c
test cleanup
rasmith 24a6b3b
yapf
rasmith 3d2854c
merge main
rasmith c3b8102
Adjust threshold
rasmith a84c7d7
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith c7fbacf
simplify unit test and use assert_close
rasmith 11860d6
clean up test
rasmith bea93a2
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith 0c45b68
use marlin tolerance
rasmith bbfb4d9
update test
rasmith 13bb612
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith 226e7fb
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith c4e3fd1
Merge branch 'vllm-project:main' into ransmith_awq_gemm_triton
rasmith 62612ee
Support more group sizes
rasmith 5d91e78
Merge branch 'main' into ransmith_awq_gemm_triton
rasmith ba434dc
Merge branch 'ransmith_awq_gemm_triton' of github.com:rasmith/vllm in…
rasmith 2db93e0
assert added
rasmith f07c241
ruff
rasmith e95dfc4
ruff
rasmith efbd8a5
isort
rasmith 69573dd
test update
rasmith d456232
update comment
rasmith File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to set this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure of this, maybe there is a better place, like env or something? I will ask for a suggestion, or maybe you have one. I will also look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added a new env for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlrmchlsmth I closed #6850, so this PR will be the sole AWQ Triton PR for my work, there is another possibly coming for KV cache and Flash attention. So, the awq_dequantize_kernel is now for this PR too.