-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek V3 Support #35425
Comments
Historical perspective on Deepseek. Deepseek v2 support was never added by Deepseek team. There is a community v2 PR that never went out of draft phase. Unless added by oss community or hf, history shows deepseek will not proactively add hf support as their priority is sglang, lmdeploy and others. Lets hope someone or hf pickup this ball on this as this is not a simple model to support. |
The DeepSeek v3 code is mostly available already though. They put an MIT license on the code in their repository. So a PR mostly needs a multi-token prediction implementation. https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py |
This isn't quite correct right? DeepSeek-V2/V2.5 never got natively implemented as per #31976 and there is modeling code for V2 as well. https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/modeling_deepseek.py |
Looking at the ADD_NEW_MODEL_PROPOSAL_TEMPLATE.md from the REPO I'm willing to try my hand at attempting a native DeepSeek-V3 impelemntation but since that README was last updated 10 months ago is the mentor system still valid and is this the up to date way to go about it? |
cc @Cyrilvallez on the last comment above regarding guiding model integrations ^ |
Thanks! Would love to do this as per protocol as I am dying to use transformers tools on this model! |
https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#L439 |
Bumping for @Cyrilvallez advisement. We, not really me @fairydreaming, skipped transformers functionally for llama.cpp. But this would be very appreciated to have for those who want to try like exl2 or AWQ |
Hey @Nottlespike! Very nice that you want to tackle this! Sorry for the late answer, I was still on vacation. Regarding model integrations rules, you can also check here and modular rules. |
Model description
Transformer model
DeepSeek V3 is a Transformer model that utilizes Mixture of Experts (similar to Qwen2 MoE) and Multi-head Latent Attention (MLA).
Multi-token Prediction
The model is able to predict multiple tokens sequentially at each step through the MTP modules. The first token is generated by the causal LM which feeds the output token into what I would describe as a "Transformer head" to generate additional tokens for the current step. DeepSeek notes in their release that "MTP support is currently under active development within the community, and we welcome your contributions and feedback." (i.e. code for this is not released).
Open source status
Provide useful links for the implementation
Transformers Code: https://huggingface.co/deepseek-ai/DeepSeek-V3
GitHub Code (minimal implementation): https://github.com/deepseek-ai/DeepSeek-V3/tree/main/inference
Paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
The text was updated successfully, but these errors were encountered: