DeepSeek V3 Support #35425

casper-hansen · 2024-12-26T18:37:54Z

Model description

Transformer model

DeepSeek V3 is a Transformer model that utilizes Mixture of Experts (similar to Qwen2 MoE) and Multi-head Latent Attention (MLA).

Multi-token Prediction

The model is able to predict multiple tokens sequentially at each step through the MTP modules. The first token is generated by the causal LM which feeds the output token into what I would describe as a "Transformer head" to generate additional tokens for the current step. DeepSeek notes in their release that "MTP support is currently under active development within the community, and we welcome your contributions and feedback." (i.e. code for this is not released).

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Transformers Code: https://huggingface.co/deepseek-ai/DeepSeek-V3
GitHub Code (minimal implementation): https://github.com/deepseek-ai/DeepSeek-V3/tree/main/inference
Paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

Qubitium · 2024-12-27T03:21:27Z

Historical perspective on Deepseek. Deepseek v2 support was never added by Deepseek team. There is a community v2 PR that never went out of draft phase.

#31976

Unless added by oss community or hf, history shows deepseek will not proactively add hf support as their priority is sglang, lmdeploy and others.

Lets hope someone or hf pickup this ball on this as this is not a simple model to support.

@ArthurZucker

casper-hansen · 2024-12-27T08:29:53Z

The DeepSeek v3 code is mostly available already though. They put an MIT license on the code in their repository. So a PR mostly needs a multi-token prediction implementation.

https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py

Nottlespike · 2024-12-27T17:47:21Z

The DeepSeek v3 code is mostly available already though. They put an MIT license on the code in their repository. So a PR mostly needs a multi-token prediction implementation.

https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py

This isn't quite correct right? DeepSeek-V2/V2.5 never got natively implemented as per #31976 and there is modeling code for V2 as well. https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/modeling_deepseek.py

Nottlespike · 2024-12-27T18:58:19Z

Looking at the ADD_NEW_MODEL_PROPOSAL_TEMPLATE.md from the REPO I'm willing to try my hand at attempting a native DeepSeek-V3 impelemntation but since that README was last updated 10 months ago is the mentor system still valid and is this the up to date way to go about it?

LysandreJik · 2024-12-29T14:00:07Z

cc @Cyrilvallez on the last comment above regarding guiding model integrations ^

Nottlespike · 2024-12-29T19:52:51Z

cc @Cyrilvallez on the last comment above regarding guiding model integrations ^

Thanks! Would love to do this as per protocol as I am dying to use transformers tools on this model!

IYoreI · 2025-01-02T03:34:46Z

https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#L439
what does this mean? It seems that this code can only run inference

Nottlespike · 2025-01-03T04:48:22Z

Bumping for @Cyrilvallez advisement. We, not really me @fairydreaming, skipped transformers functionally for llama.cpp. But this would be very appreciated to have for those who want to try like exl2 or AWQ

Cyrilvallez · 2025-01-06T14:11:43Z

Hey @Nottlespike! Very nice that you want to tackle this! Sorry for the late answer, I was still on vacation. Regarding model integrations rules, you can also check here and modular rules.
What you want to do is to isolate the small individual changes from your model (deepseek), and existing models in the library (e.g. mixtral, qwen2 moe, deepseek v1...), then create a modular file based on those differences. You can check e.g. this past model integration for an example of model addition with modular.
The modeling code already on the hub should provide a strong starting point.
Let me know if you have any questions 🤗

…feature/huggingface#35425

…um/transformers into feature/huggingface#35425

…-deepseek-fp8

casper-hansen added the New model label Dec 26, 2024

This was referenced Dec 30, 2024

Support for DeepseekV3 680B kvcache-ai/ktransformers#117

Closed

Feature Request: add DeepSeek-v3 support ggerganov/llama.cpp#10981

Closed

bzantium linked a pull request Jan 28, 2025 that will close this issue

[WIP] add deepseek-v3 #35926

Open

4 tasks

bzantium added a commit to bzantium/transformers that referenced this issue Jan 28, 2025

Merge branch 'main' into feature/huggingface#35425

737ee3a

bzantium added a commit to bzantium/transformers that referenced this issue Jan 28, 2025

Merge branch 'main' of https://github.com/bzantium/transformers into …

fc3a4c7

…feature/huggingface#35425

bzantium added a commit to bzantium/transformers that referenced this issue Jan 28, 2025

Merge branch 'feature/huggingface#35425' of https://github.com/bzanti…

0968df5

…um/transformers into feature/huggingface#35425

bzantium added a commit to bzantium/transformers that referenced this issue Jan 30, 2025

Merge branch 'main' into feature/huggingface#35425

96562c4

yuxianq mentioned this issue Feb 3, 2025

[BUG] Fail to import is_torch_greater_or_equal_than_1_13 since transformers v4.48.0 for all deepseek models deepseek-ai/DeepSeek-V3#290

Closed

MekkCyber added a commit that referenced this issue Feb 6, 2025

Merge remote-tracking branch 'deepseek_fork/feature/#35425' into test…

65420ad

…-deepseek-fp8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek V3 Support #35425

DeepSeek V3 Support #35425

casper-hansen commented Dec 26, 2024 •

edited

Loading

Qubitium commented Dec 27, 2024

casper-hansen commented Dec 27, 2024

Nottlespike commented Dec 27, 2024

Nottlespike commented Dec 27, 2024

LysandreJik commented Dec 29, 2024

Nottlespike commented Dec 29, 2024

IYoreI commented Jan 2, 2025

Nottlespike commented Jan 3, 2025

Cyrilvallez commented Jan 6, 2025

DeepSeek V3 Support #35425

DeepSeek V3 Support #35425

Comments

casper-hansen commented Dec 26, 2024 • edited Loading

Model description

Transformer model

Multi-token Prediction

Open source status

Provide useful links for the implementation

Qubitium commented Dec 27, 2024

casper-hansen commented Dec 27, 2024

Nottlespike commented Dec 27, 2024

Nottlespike commented Dec 27, 2024

LysandreJik commented Dec 29, 2024

Nottlespike commented Dec 29, 2024

IYoreI commented Jan 2, 2025

Nottlespike commented Jan 3, 2025

Cyrilvallez commented Jan 6, 2025

casper-hansen commented Dec 26, 2024 •

edited

Loading