-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kernel] Use sgl_kernel rope #3169
Conversation
Please update this version limit. Line 30 in 741fccd
|
7de4b68
to
7e888cc
Compare
needs #3173 |
self.is_neox_style, | ||
) | ||
if _is_cuda_available: | ||
apply_rope_with_cos_sin_cache_inplace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is going well except for an accuracy issue with test_session_control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the test is flaky by nature. I switched from 1B to 8B model on the main and test_session_control failed with different output: #3184.
Here is my finding so far
Kernel | 1B Model | 8B Model |
---|---|---|
vLLM | Pass | Fail |
flashinfer | Fail | Fail |
Given other tests for accuracy all pass. I think the correctness looks ok
2bf7531
to
826e2e5
Compare
826e2e5
to
4a5a0b8
Compare
c76880b
to
1b0c7ac
Compare
Byron is cooking!! Great work. |
"Byron" XD |
Motivation
Depends on https://github.com/sgl-project/sglang/actions/runs/12984127304
Modifications
Checklist