-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Failure to Dispatch Head Dimension 80 in sglang with Specific Configurations #1109
Comments
By the way, I'm waiting to fix the failure to Dispatch head_dim=80, and then I'll merge the model support local branch into the online main branch. |
Consider trying support in FlashInfer or the Triton kernel. From the perspective of rapid verification, I suggest you modify the Triton kernel. |
I hit a different error. not sure what is wrong on my side. |
I am fixing the kernel for non 2^n case. Happy to resolve this together, but i am not able to reproduce. |
@ByronHsu you can try with the existing stablelm model code (which already supports stablelm-2) and the stabilityai/stablelm-3b-4e1t model name. The latter also has a head dim of 80. Your fix makes it work. |
@zhyncs we can close this now |
Checklist
Describe the bug
Issue Description:
When running sglang with hidden_dim set to 80, the following exceptions are encountered under different configurations:
With enable_cuda_graph set to True:
The complete stack information is as follows:
With disable_cuda_graph set to True:
The complete stack information is as follows:
With disable_flashinfer set to True:
AssertionError: assert Lq in {16, 32, 64, 128, 256, 576}, where Lq = 80.
The complete stack information is as follows:
Reproduction
run command
model
xverse/XVERSE-MoE-A4.2B-Chat
Steps to Reproduce:
I am currently supporting XVERSE models in a local branch and encountered an issue where the server failed to launch. Although the issue cannot be replicated directly, possible solutions may be inferred from the error message and the configuration file.
Below is the config.json file, where the
hidden_size/num_attention_heads = 2560/32= 80
.Environment
The text was updated successfully, but these errors were encountered: