Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCoder Series #1858

Open
ysjprojects opened this issue Dec 7, 2024 · 0 comments
Open

OpenCoder Series #1858

ysjprojects opened this issue Dec 7, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@ysjprojects
Copy link
Contributor

OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. Starting from scratch, OpenCoder is trained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, reaching the performance of top-tier code LLMs. We provide not only model weights and inference code, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols. Empowering researchers to build and innovate, OpenCoder is your open foundation for advancing code AI.

State of the art code LLM that beats Qwen2.5-Coder of equivalent size.

https://opencoder-llm.github.io/
https://arxiv.org/pdf/2411.04905

https://huggingface.co/infly/OpenCoder-1.5B
https://huggingface.co/infly/OpenCoder-1.5B-Instruct
https://huggingface.co/infly/OpenCoder-8B
https://huggingface.co/infly/OpenCoder-8B-Instruct

@ysjprojects ysjprojects added the enhancement New feature or request label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant