GitHub - BBuf/sglang at refs/heads/main

Name	Name	Last commit message	Last commit date
Latest commit ispobock Remove fp8 monkey patch (sgl-project#2960 ) Jan 18, 2025 656dcc1 · Jan 18, 2025 History 1,756 Commits
.devcontainer	.devcontainer	feat: add devcontainer.json for VSCode development (sgl-project#2745 )	Jan 6, 2025
.github	.github	Update pr template (sgl-project#2951 )	Jan 17, 2025
3rdparty/amd	3rdparty/amd	Add a new api configure_logging to allow dumping the requests (sgl-pr…	Jan 13, 2025
assets	assets	Add OpenAI backend to the CI test (sgl-project#869 )	Aug 1, 2024
benchmark	benchmark	Multi-turn benchmark for hierarchical caching (sgl-project#2942 )	Jan 18, 2025
docker	docker	chore: bump v0.4.1.post6 (sgl-project#2899 )	Jan 15, 2025
docs	docs	Fix Llama-3.1-405B References Docs (sgl-project#2944 )	Jan 17, 2025
examples	examples	Eagle speculative decoding part 4: Add EAGLE2 worker (sgl-project#2150 )	Jan 2, 2025
python	python	Remove fp8 monkey patch (sgl-project#2960 )	Jan 18, 2025
scripts	scripts	update ci install dependency (sgl-project#2949 )	Jan 17, 2025
sgl-kernel	sgl-kernel	minor: use bear for compilation database (sgl-project#2919 )	Jan 16, 2025
sgl-router	sgl-router	docs: update link (sgl-project#2857 )	Jan 13, 2025
test	test	support e4m3 kvcache in qwen2 & add kv scaling facotr json (sgl-proje…	Jan 18, 2025
.editorconfig	.editorconfig	minor: Add basic editorconfig and pre-commit hooks to enforce style f…	Nov 6, 2024
.gitignore	.gitignore	Add `update_weights_from_tensor` (sgl-project#2631 )	Dec 28, 2024
.gitmodules	.gitmodules	introduce CUB in sgl-kernel (sgl-project#2887 )	Jan 14, 2025
.isort.cfg	.isort.cfg	minor: Add basic editorconfig and pre-commit hooks to enforce style f…	Nov 6, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	feat(pre-commit): trim unnecessary notebook metadata from git history (…	Nov 22, 2024
LICENSE	LICENSE	docs: fix module docstrings and copyright headers (sgl-project#2077 )	Nov 22, 2024
Makefile	Makefile	[UTILS] improve makefile a bit by adding help info (sgl-project#2570 )	Dec 26, 2024
README.md	README.md	docs: add Cursor for adoption and sponsorship (sgl-project#2950 )	Jan 17, 2025

Name

Last commit message

Last commit date

ispobock

Remove fp8 monkey patch (sgl-project#2960 )

Jan 18, 2025

656dcc1 · Jan 18, 2025

1,756 Commits

.devcontainer

feat: add devcontainer.json for VSCode development (sgl-project#2745 )

Jan 6, 2025

.github

Update pr template (sgl-project#2951 )

Jan 17, 2025

3rdparty/amd

Add a new api configure_logging to allow dumping the requests (sgl-pr…

Jan 13, 2025

assets

Add OpenAI backend to the CI test (sgl-project#869 )

Aug 1, 2024

benchmark

Multi-turn benchmark for hierarchical caching (sgl-project#2942 )

Jan 18, 2025

docker

chore: bump v0.4.1.post6 (sgl-project#2899 )

Jan 15, 2025

docs

Fix Llama-3.1-405B References Docs (sgl-project#2944 )

Jan 17, 2025

examples

Eagle speculative decoding part 4: Add EAGLE2 worker (sgl-project#2150 )

Jan 2, 2025

python

Remove fp8 monkey patch (sgl-project#2960 )

Jan 18, 2025

scripts

update ci install dependency (sgl-project#2949 )

Jan 17, 2025

sgl-kernel

minor: use bear for compilation database (sgl-project#2919 )

Jan 16, 2025

sgl-router

docs: update link (sgl-project#2857 )

Jan 13, 2025

test

support e4m3 kvcache in qwen2 & add kv scaling facotr json (sgl-proje…

Jan 18, 2025

.editorconfig

minor: Add basic editorconfig and pre-commit hooks to enforce style f…

Nov 6, 2024

.gitignore

Add update_weights_from_tensor (sgl-project#2631 )

Dec 28, 2024

.gitmodules

introduce CUB in sgl-kernel (sgl-project#2887 )

Jan 14, 2025

.isort.cfg

minor: Add basic editorconfig and pre-commit hooks to enforce style f…

Nov 6, 2024

.pre-commit-config.yaml

feat(pre-commit): trim unnecessary notebook metadata from git history (…

Nov 22, 2024

LICENSE

docs: fix module docstrings and copyright headers (sgl-project#2077 )

Nov 22, 2024

Makefile

[UTILS] improve makefile a bit by adding help info (sgl-project#2570 )

Dec 26, 2024

README.md

docs: add Cursor for adoption and sponsorship (sgl-project#2950 )

Jan 17, 2025

News

[2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
[2024/10] 🔥 The First SGLang Online Meetup (slides).
[2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
[2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).

[2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
[2024/04] SGLang is used by the official LLaVA-NeXT (video) release (blog).
[2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
[2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:

Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).
Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
Active Community: SGLang is open-source and backed by an active community with industry adoption.

Getting Started

Benchmark and Performance

Learn more in the release blogs: v0.2 blog, v0.3 blog, v0.4 blog

Roadmap

Development Roadmap (2024 Q4)

Adoption and Sponsorship

The project is supported by (alphabetically): AMD, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS.org, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.

Acknowledgment and Citation

We learned the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL. Please cite the paper, SGLang: Efficient Execution of Structured Language Model Programs, if you find the project useful.

Languages

Python 94.1%

Rust 2.5%

Cuda 1.6%

C++ 1.2%

Shell 0.4%

Dockerfile 0.1%

Makefile 0.1%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

About

Getting Started

Benchmark and Performance

Roadmap

Adoption and Sponsorship

Acknowledgment and Citation

About

Releases

Packages

Languages

License

BBuf/sglang

Folders and files

Latest commit

History

Repository files navigation

News

About

Getting Started

Benchmark and Performance

Roadmap

Adoption and Sponsorship

Acknowledgment and Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages