Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

55555 #19

Merged
merged 258 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
258 commits
Select commit Hold shift + click to select a range
539df95
Imporve openai api documents (#1827)
zhaochenyang20 Oct 30, 2024
b548801
Update docs (#1839)
merrymercy Oct 30, 2024
3184aa9
Update README.md (#1840)
merrymercy Oct 30, 2024
4e2af03
[Production] Drain requests before exit when receive SIGTERM (#1838)
Ying1123 Oct 30, 2024
5f65e2b
[Performance, Hardware] MoE weights padding to AMD MI300x GPUs (#1836)
HaiShaw Oct 30, 2024
4ba815b
Fix suggest edit (#1842)
zhaochenyang20 Oct 30, 2024
2d4ce1b
[Performance, Triton Kernel Args] _decode_grouped_softmax_reducev_fwd…
HaiShaw Oct 31, 2024
a7a0a68
Make decode log interval configurable (#1847)
ByronHsu Oct 31, 2024
f7102fb
Fix mixed chunked prefill (#1850)
merrymercy Oct 31, 2024
438526a
Refactor tokenizer manager (#1846)
ByronHsu Oct 31, 2024
0ab7bca
Simplify documentation in README.md (#1851)
merrymercy Oct 31, 2024
d913d52
Fix warnings in doc build (#1852)
merrymercy Oct 31, 2024
8ce202a
delete unused character (#1855)
geeker-smallwhite Oct 31, 2024
a2e0424
Fix memory leak for chunked prefill 2 (#1858)
merrymercy Oct 31, 2024
d8e9d61
[Build, ROCm] Dockerfile.rocm for Instinct GPUs, with package updates…
HaiShaw Oct 31, 2024
b9fd178
Fix retraction + overlap (#1860)
hnyls2002 Nov 1, 2024
61cf00e
change file tree (#1859)
zhaochenyang20 Nov 1, 2024
16eb33f
Update vocab embedding deps and add TP switch (#1856)
ispobock Nov 1, 2024
d86a2d6
minor: add human eval (#1754)
zhyncs Nov 1, 2024
3bf3d01
Add vlm document (#1866)
zhaochenyang20 Nov 1, 2024
104bf26
minor: update nightly eval (#1867)
zhyncs Nov 1, 2024
d59a478
[3rdparty, document] Updated Documentation that covers performance tu…
yichiche Nov 1, 2024
d1b31b0
Improve docs and fix the broken links (#1875)
merrymercy Nov 2, 2024
a54f278
Add a FAQ documentation (#1877)
merrymercy Nov 2, 2024
2134f08
Fix links in the docs (#1878)
merrymercy Nov 2, 2024
066e8a4
Update docs title (#1879)
merrymercy Nov 2, 2024
2565cb0
Update docs and workflow (#1881)
merrymercy Nov 2, 2024
660ecb7
Fix doc links (#1882)
merrymercy Nov 2, 2024
146f613
Fix incorrect context length for llama3.2-11b (#1873)
rchen19 Nov 2, 2024
72e979b
add native api docs (#1883)
zhaochenyang20 Nov 2, 2024
5a9a4f4
Update index.rst (#1885)
merrymercy Nov 2, 2024
3b60558
Native api (#1886)
zhaochenyang20 Nov 2, 2024
7b394e5
Fix docs (#1889)
merrymercy Nov 2, 2024
5a5f184
Fix docs ci (#1888)
zhaochenyang20 Nov 2, 2024
be7986e
Fix docs (#1890)
merrymercy Nov 2, 2024
f4cd804
Fix ci and link error (#1892)
zhaochenyang20 Nov 3, 2024
908dd7f
Add engine api (#1894)
zhaochenyang20 Nov 3, 2024
6aed044
turn off log (#1895)
zhaochenyang20 Nov 3, 2024
efbc116
Do not use longest prefix matching when #queue-req is large (#1896)
merrymercy Nov 3, 2024
838dcda
Simplify tokenizer manager (#1899)
merrymercy Nov 3, 2024
916b3cd
Allow passing dtype and max_new_tokens to HF reference script (#1903)
janimo Nov 3, 2024
c17c578
Simplify tokenizer manager (#1904)
merrymercy Nov 3, 2024
0abbf28
Unify the model type checking (#1905)
merrymercy Nov 3, 2024
1363b51
Escape backwards slash (#1902)
inakineitor Nov 3, 2024
793b79d
feat: support truss endpoint for benchmark serving (#1906)
zhyncs Nov 3, 2024
2ce32db
Let reward model take text inputs instead of message lists (#1907)
merrymercy Nov 3, 2024
6585975
Release v0.3.5 (#1908)
merrymercy Nov 3, 2024
1853c35
Fix regex docs (#1909)
merrymercy Nov 3, 2024
704f8e8
Add Reward API Docs etc (#1910)
zhaochenyang20 Nov 4, 2024
3cd2809
[Docs, ROCm] update install to cover ROCm with MI GPUs (#1915)
HaiShaw Nov 4, 2024
530ff54
[router] Impl radix tree and set up CI (#1893)
ByronHsu Nov 4, 2024
463d56b
Update CODEOWNERS (#1916)
ByronHsu Nov 5, 2024
0275576
Change judge to classify & Modify make file (#1920)
zhaochenyang20 Nov 5, 2024
f5113e5
[Doc] improve relative links and structure (#1924)
merrymercy Nov 5, 2024
a146d99
support prometheus metrics (#1853)
Lzhang-hub Nov 6, 2024
9676610
[rust] refactor server and router (#1922)
ByronHsu Nov 6, 2024
a5e0def
minor: Add basic editorconfig and pre-commit hooks to enforce style f…
XuehaiPan Nov 6, 2024
4b1d7a2
Add Rust Router Python Binding (#1891)
austin362667 Nov 7, 2024
dca87ec
[Docs] fix 404 - Contributor Guide (#1942)
HaiShaw Nov 7, 2024
c77c1e0
fix black in pre-commit (#1940)
zhaochenyang20 Nov 7, 2024
1ae270c
[Doc] fix docs (#1949)
merrymercy Nov 8, 2024
67c424c
[Performance, Triton Kernel Args] extend_attention, optimize kern arg…
HaiShaw Nov 8, 2024
d32fba2
[ENV, ROCm] update environment settings (#1939)
HaiShaw Nov 8, 2024
691808d
Add a timeout for execute-notebook.yml (#1951)
merrymercy Nov 8, 2024
a71a44f
Update setup_github_runner.md (#1952)
merrymercy Nov 8, 2024
5bc2508
Monitoring documentation (#1933)
binarycrayon Nov 8, 2024
f16eb15
Gemma2 reward model support (#1954)
aqweteddy Nov 8, 2024
8dc84da
Remove the useless to_srt_kwargs (#1955)
merrymercy Nov 8, 2024
4ade15d
Adjust reward model's score module and pooler module order for reduci…
aqweteddy Nov 8, 2024
f9a377f
[Release, ROCm] release ROCm docker build for AMD MI GPUs (#1957)
HaiShaw Nov 8, 2024
7ef0084
Add sentence_transformers to CI dependency (#1958)
merrymercy Nov 8, 2024
a509552
[minor] Improve code style and compatibility (#1961)
merrymercy Nov 8, 2024
e3126e3
Update README.md's Slack invitation link (#1962)
zhaochenyang20 Nov 8, 2024
d1150e9
Updated Instructions on Profiling SGLang Infer System with AMD GPUs (…
leishaoSC Nov 9, 2024
95a4ed1
Fix metrics (#1963)
binarycrayon Nov 9, 2024
f11eb90
Initialize model_worker_batch variable (#1973)
qeternity Nov 9, 2024
d9aada9
Introducing SGLang Guru on Gurubase.io (#1745)
kursataktas Nov 9, 2024
760552e
Update README.md (#1974)
merrymercy Nov 9, 2024
a1f3286
Update pr-test-rust.yml to add a "finish" step (#1975)
merrymercy Nov 9, 2024
549e8b8
[Minor] Fix a typo in test_torchao.py (#1976)
merrymercy Nov 9, 2024
9c939a3
Clean up metrics code (#1972)
merrymercy Nov 9, 2024
520f009
[CI] balance unit tests (#1977)
merrymercy Nov 10, 2024
ed53ac8
Specify `zmq` Version Requirement (#1982)
HuanzhiMao Nov 10, 2024
1929c06
Simplify prometheus metrics (#1981)
merrymercy Nov 10, 2024
b3523af
fix: update pyzmq version (#1983)
zhyncs Nov 10, 2024
47ffe7a
docs: add shm size for docker run (#1986)
zhyncs Nov 10, 2024
a8aad93
qwen2vl fix bug for #1971 #1897 (#1984)
yizhang2077 Nov 10, 2024
3d04331
[CI] Balance unit tests (#1988)
merrymercy Nov 10, 2024
8169c6f
Add gen-shared-prefix dataset in bench_serving (#1990)
ByronHsu Nov 11, 2024
087ab83
[Performance, Triton] Optimize over mask compute to tl.load in fused_…
HaiShaw Nov 11, 2024
f9633fa
[rust] cache-aware DP - approx tree (#1934)
ByronHsu Nov 11, 2024
aaf0a31
docs: add slides link in README (#1997)
zhyncs Nov 11, 2024
ddeb9d4
Add engine encode (#1995)
james-p-xu Nov 11, 2024
00ffde2
setup router python binding ci (#1999)
ByronHsu Nov 11, 2024
9d42726
Add Engine::encode example (#2000)
james-p-xu Nov 11, 2024
239eafb
Fix rust unit test and pypi token (#2001)
ByronHsu Nov 11, 2024
e728258
release router from py38 to py312 (#2002)
ByronHsu Nov 11, 2024
0d94f1d
Bump router to 0.0.3 (#2004)
ByronHsu Nov 11, 2024
3e33574
run rust test on ubuntu instead of 1-gpu-runner (#2003)
ByronHsu Nov 11, 2024
f18b9c7
support internlm2-reward (#1994)
RangiLyu Nov 11, 2024
86c37d0
fix sglang_router not found (#2005)
ByronHsu Nov 11, 2024
59a5ba9
[Minor] Remove unused imports (#2006)
merrymercy Nov 11, 2024
befc6be
Fix a typo in io_struct.py (#2008)
merrymercy Nov 12, 2024
530ae1b
Fix weight loading for tied word embedding when TP > 1 (#2009)
merrymercy Nov 12, 2024
602ebc6
remove sglang folder in rust (#2010)
ByronHsu Nov 12, 2024
b808a38
Filter empty prompt in random bench serving (#2011)
ispobock Nov 12, 2024
027e652
support echo=true and logprobs in openai api when logprobs=1 in lm-ev…
BBuf Nov 12, 2024
78c1d64
Fix finish reason (#2013)
merrymercy Nov 12, 2024
a1bd719
fix a bug in v1_embeeding_request (#2014)
BBuf Nov 12, 2024
eff468d
fix test_embedding_models prompt length too long's bug (#2015)
BBuf Nov 12, 2024
125b119
support parallel grammar preprocessing (#1996)
DarkSharpness Nov 12, 2024
ba069a2
Fix grammar backend (#2018)
merrymercy Nov 13, 2024
54479d6
Fix grammar backend for tensor parallelism (#2020)
merrymercy Nov 13, 2024
f407fcf
Release v0.3.5.post1 (#2022)
merrymercy Nov 13, 2024
218ab36
Do not let invalid grammar crash the server (#2023)
merrymercy Nov 13, 2024
c722d9b
Fix dependency and error message for xgrammar (#2024)
merrymercy Nov 13, 2024
fb9fb35
set content to empty string (#2026)
chottolabs Nov 14, 2024
df246e6
chore: open lto and optimization in release profile (#2028)
ethe Nov 14, 2024
13ce3e4
Add download_dir ServerArgs property (#2027)
pjyi2147 Nov 14, 2024
b275ce0
Github runner instructions for AMD (#2031)
HaiShaw Nov 14, 2024
c3eac1b
Fix torch.compile for MoE (#2033)
merrymercy Nov 14, 2024
aae5434
Fix unit tests (#2034)
merrymercy Nov 14, 2024
a10d530
Fix outlines version (#2036)
merrymercy Nov 14, 2024
ea53c63
Expose no_stop_trim and skip_special_tokens in openai api (#2039)
merrymercy Nov 15, 2024
f6dd648
Offline LLM Engine Benchmark Throughput (#1968)
zolinthecow Nov 15, 2024
29ebe3d
fix: align enable_overlap_scheduler naming between code and docs (#2038)
w1ndseeker Nov 15, 2024
2558d6a
Fix the default arguments of bench_offline_throughput.py & simplify d…
merrymercy Nov 15, 2024
954f4e6
benchmark json schema (#2030)
DarkSharpness Nov 15, 2024
c29b98e
Fix json benchmark (#2043)
merrymercy Nov 15, 2024
b01df48
[Fix] Adjust default chunked prefill size and cuda graph max bs accor…
merrymercy Nov 15, 2024
32c9a7e
Release v0.3.5.post2 (#2046)
merrymercy Nov 15, 2024
023d0a7
fix small typos in docs (#2047)
BBuf Nov 15, 2024
e5c6715
Fix core (MI300X) with --enable-overlap (#2048)
HaiShaw Nov 16, 2024
cf24897
Add Tensor Parallel to torch_native_llama (#1876)
kwen2501 Nov 16, 2024
2ffe0a7
Add get_amdgpu_memory_capacity() (#2049)
HaiShaw Nov 16, 2024
2f2e074
Fix weight update for data parallelism (#2050)
merrymercy Nov 16, 2024
976bc30
Support DP MLA (#1970)
ispobock Nov 16, 2024
edad373
Fix illegal memory access in overlap mode & Use more fused triton ker…
merrymercy Nov 17, 2024
f719d9a
Launch dp ranks in parallel (#2053)
merrymercy Nov 17, 2024
3b87886
chore: update torch v2.5.1 (#1849)
zhyncs Nov 17, 2024
c1f401f
Revert "chore: update torch v2.5.1" (#2063)
merrymercy Nov 17, 2024
38625e2
Remove monkey_patch_vllm_dummy_weight_loader (#2064)
merrymercy Nov 17, 2024
11f881d
Deprecate --disable-flashinfer and --disable-flashinfer-sampling (#2065)
merrymercy Nov 18, 2024
62832bb
Support cuda graph for DP attention (#2061)
ispobock Nov 18, 2024
ebaa2f3
Rename arguments `--disable-nan-detection` to `--enable-nan-detection…
merrymercy Nov 18, 2024
9c745d0
[Performance] Update xgrammar-related constrained decoding (#2056)
DarkSharpness Nov 18, 2024
8c280ce
add phi-3 small support (#2062)
Tushar-ml Nov 18, 2024
a9e90b4
[Minor] Fix styles for overlap mode (#2068)
merrymercy Nov 18, 2024
1166853
Fix cuda illegal memory access in overlap mode (#2070)
merrymercy Nov 18, 2024
a7164b6
Tune the threshold for accuracy tests in CI (#2071)
merrymercy Nov 18, 2024
df7fe45
Crash the CI jobs on model import errors (#2072)
merrymercy Nov 18, 2024
4af3f88
Simplify flashinfer indices update for prefill (#2074)
merrymercy Nov 18, 2024
2a3992b
support set role as 'tool' (#2075)
yukavio Nov 18, 2024
7661926
feat: update torch 2.5.1 (#2069)
zhyncs Nov 18, 2024
66318ff
Rename layer_idx to layer_id for consistency (#2078)
janimo Nov 18, 2024
80e2c4a
Fix chunked prefill with output logprob (#2083)
merrymercy Nov 18, 2024
3b44bbe
Allow passing extra request body to bench_offline_throughput.py (#2085)
merrymercy Nov 18, 2024
b110453
Simplify logits penalizer (#2086)
merrymercy Nov 19, 2024
b7a065e
Use cuda event wait and synchronization instead of busy waiting (#2089)
merrymercy Nov 19, 2024
929c762
Fix: incorrect top_logprobs in chat completion (#2088)
ajwaitz Nov 19, 2024
f239268
minor: update gsm8k eval (#2091)
zhyncs Nov 19, 2024
e57c3e1
Use native fp8 format on MI300X (#2094)
HaiShaw Nov 19, 2024
55bd97f
minor: add dataset dump and questions shuffle (#2093)
zhyncs Nov 19, 2024
ffd20fc
Make constrained decoding work for overlap scheduler (#2095)
merrymercy Nov 19, 2024
699384c
Set schedule policy more conservative for DP attention (#2096)
ispobock Nov 20, 2024
7d671e4
Enable overlap by default (#2067)
merrymercy Nov 20, 2024
63a395b
Update nightly-eval.yml (#2100)
merrymercy Nov 20, 2024
5942dfc
[feat] Add session control (#2073)
Ying1123 Nov 20, 2024
3295cd8
Allow skipping warmup in bench_offline_throughput.py (#2103)
merrymercy Nov 20, 2024
56a347f
Move test_session_id.py to playground (#2104)
merrymercy Nov 20, 2024
722530f
Enable overlap scheduler by default for the triton attention backend …
merrymercy Nov 20, 2024
5c6a41f
Error out when torchao-config option is not recognized (#2107)
jerryzh168 Nov 21, 2024
7f8fcd3
Turn off autotune for scaled mm for fp8 dynamic quant in torchao (#2116)
jerryzh168 Nov 21, 2024
f35cb46
ROCm: Fix MoE padding for none FP8 cases (#2111)
HaiShaw Nov 21, 2024
f6f7137
Add support for Qwen2-VL-based embedding models (#2055)
james-p-xu Nov 21, 2024
30af7df
[router] add base_gpu_id server args & merged radix tree python refer…
ByronHsu Nov 22, 2024
8048c28
Fix #2037 - Context length check does not take into out pad tokens fo…
jakep-allenai Nov 22, 2024
dfec7fc
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
merrymercy Nov 22, 2024
ad30d5c
Benchmark with Pytorch Profiler easily (#2110)
bjmsong Nov 22, 2024
2369e88
[minor] Clean up unused imports (#2122)
merrymercy Nov 22, 2024
4f8c3ae
minor: update gsm8k threshold (#2125)
zhyncs Nov 22, 2024
9a00e6f
chore: bump v0.3.6 (#2120)
zhyncs Nov 22, 2024
2797bc3
fix: add xgrammar dependency (#2126)
zhyncs Nov 22, 2024
62a4a33
docs: fix module docstrings and copyright headers (#2077)
XuehaiPan Nov 22, 2024
72f87b7
feat(pre-commit): trim unnecessary notebook metadata from git history…
XuehaiPan Nov 22, 2024
c35cd1f
Expose max total num tokens from Runtime & Engine API (#2092)
henryhmko Nov 22, 2024
e1b6362
Only stream output on tp rank 0 (#2124)
merrymercy Nov 22, 2024
66d4859
Revert "Only stream output on tp rank 0" (#2130)
merrymercy Nov 22, 2024
865233e
Add initial support for intel Gaudi accelerators (#2121)
ankurneog Nov 23, 2024
d98fa1e
Add simple CPU offloading support. (#2081)
janimo Nov 23, 2024
c5f8650
Fix grid size in Triton decoding kernel (#2134)
ispobock Nov 23, 2024
a78d8f8
[CI] Fix test cases (#2137)
merrymercy Nov 23, 2024
60769be
Add concurrency option for benchmark (#2136)
cermeng Nov 23, 2024
751c3a0
Fix dp print message (#2138)
merrymercy Nov 23, 2024
ad47749
fix: resolve bench_serving args (#2139)
zhyncs Nov 23, 2024
cbedd1d
[router] cache-aware load-balancing router v1 (#2114)
ByronHsu Nov 23, 2024
505d7f7
Bump sglang-router to 0.0.5 (#2142)
ByronHsu Nov 23, 2024
145c0dd
update router doc (#2143)
ByronHsu Nov 23, 2024
52f58fc
fix dp_rank env (#2144)
ByronHsu Nov 23, 2024
bbb81c2
Add more api routes (completion, health, etc) to the router (#2146)
ByronHsu Nov 23, 2024
7921690
add prefix match for certain tenant (#2147)
ByronHsu Nov 23, 2024
32293a2
Improve sglang router (#2148)
ByronHsu Nov 24, 2024
84a1698
Update release-pypi-router.yml
ByronHsu Nov 24, 2024
dbe1729
Merged three native APIs into one: get_server_info (#2152)
henryhmko Nov 24, 2024
b509db5
feat: remove the dependency on FusedMoE (#2153)
zhyncs Nov 24, 2024
9e8f8fb
feat: update gitignore and add tuning config for FusedMoE (#2155)
zhyncs Nov 24, 2024
d90c3d6
fix: resolve end-of-file-fixer (#2157)
zhyncs Nov 24, 2024
c211e7b
Simplify batch update (#2154)
merrymercy Nov 24, 2024
e3938b2
feat: update other MoE models deps (#2156)
zhyncs Nov 24, 2024
5652c56
Update CI threshold & Improve code style (#2159)
merrymercy Nov 24, 2024
fa27161
fix: use torch.sum for compatible (#2161)
zhyncs Nov 24, 2024
731146f
Fix mixed chunked prefill in overlap mode (#2158)
merrymercy Nov 24, 2024
fe5d3e8
Balance CI tests (#2162)
merrymercy Nov 24, 2024
be0124b
Rename triton_fused_moe -> fused_moe_triton (#2163)
merrymercy Nov 24, 2024
8912b76
Fix docs (#2164)
merrymercy Nov 24, 2024
dd44173
[Fused moe] add tuning fused configs for qwen2 57b and mixtral 8x7b (…
BBuf Nov 25, 2024
8e1adb8
Allow overwrite flashinfer use_tensorcore (#2169)
merrymercy Nov 25, 2024
4b0a1c9
Replace prob based with threshold based load balancing (#2170)
ByronHsu Nov 25, 2024
a866b65
Bump rust router to 0.0.8
ByronHsu Nov 25, 2024
55842eb
feat: fused_moe fp8 monkey patch (#2174)
zhyncs Nov 25, 2024
538fa0a
[Fix] Avoid calling fill_vocab_mask for terminated requests (#2175)
Ubospica Nov 25, 2024
254fd13
[CI] Split test cases in CI for better load balancing (#2180)
merrymercy Nov 25, 2024
5ada33f
Bump rustls from 0.23.16 to 0.23.18 in /rust (#2182)
dependabot[bot] Nov 25, 2024
e1e595d
[feat] Refactor session control interface and add CI (#2173)
Ying1123 Nov 25, 2024
4d62bca
[router] Replace print with logger (#2183)
ByronHsu Nov 25, 2024
c4336b2
Use custom allreduce w/ torch.compile (#2185)
merrymercy Nov 25, 2024
10189d0
[Performance]: Process affinity to CPU cores with multiple sockets su…
HaiShaw Nov 25, 2024
3c5538f
Update CI threshold (#2186)
merrymercy Nov 25, 2024
7f076c2
Update XGrammar to the latest API (#2176)
Ubospica Nov 25, 2024
1f76fc6
[router] Rust e2e test (#2184)
ByronHsu Nov 26, 2024
1aea19f
Input_embeds support (#2052)
RinRin-32 Nov 26, 2024
1605ae1
[CI] Minor fix for CI (#2187)
merrymercy Nov 26, 2024
ea34350
Rename double sparsity config file (#2188)
merrymercy Nov 26, 2024
ac5a0f0
Release v0.3.6.post1 (#2189)
merrymercy Nov 26, 2024
ba4ee37
Update sampler.py to skip the success check (#2197)
merrymercy Nov 26, 2024
e4118b1
remove unused imports (#2195)
WrRan Nov 26, 2024
88c7763
Remove unresolved reference 'self' (#2198)
apemost Nov 26, 2024
867e092
using `is not` not `!=` to test `None` (#2196)
WrRan Nov 26, 2024
bc1f6fd
fix: add cuda-python for xgrammar (#2199)
zhyncs Nov 26, 2024
30ce5b5
minor: update check_env (#2201)
zhyncs Nov 26, 2024
19f33b3
add sglang version to get_server_info (#2206)
binarycrayon Nov 26, 2024
de3b67b
docs: update adoption (#2204)
zhyncs Nov 26, 2024
2763c0a
Bump router to 0.0.9 with better logging (#2207)
ByronHsu Nov 26, 2024
0b46b95
Fix rust warning (#2208)
ByronHsu Nov 26, 2024
c754652
Fix flasky tests (#2212)
merrymercy Nov 27, 2024
37c8a57
[feat] Support session control for vision language models (#2210)
Ying1123 Nov 27, 2024
a0e5874
Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it o…
merrymercy Nov 27, 2024
6997e28
Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; t…
merrymercy Nov 27, 2024
fb6e04a
Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it o…
merrymercy Nov 27, 2024
fed4c69
Release v0.3.6.post2 (#2214)
merrymercy Nov 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 25 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# https://editorconfig.org/

root = true

[*]
charset = utf-8
end_of_line = lf
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true

[*.{json,yaml,yml}]
indent_size = 2

[*.md]
indent_size = 2
x-soft-wrap-text = true

[*.rst]
indent_size = 4
x-soft-wrap-text = true

[Makefile]
indent_style = tab
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
/python/sglang/srt/sampling @merrymercy @hnyls2002
/test/lang @merrymercy @Ying1123 @ByronHsu
/test/srt @merrymercy @Ying1123 @zhyncs
/rust @ByronHsu @Ying1123
6 changes: 3 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@

## Checklist

- [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/contributor_guide.md).
- [ ] Add unit tests as outlined in the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/contributor_guide.md).
- [ ] Update documentation as needed, including docstrings or example tutorials.
- [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/references/contributor_guide.md).
- [ ] Add unit tests as outlined in the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/references/contributor_guide.md).
- [ ] Update documentation as needed, including docstrings or example tutorials.
14 changes: 7 additions & 7 deletions .github/workflows/close-inactive-issues.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ jobs:
github-token: ${{secrets.GITHUB_TOKEN}}
script: |
const sixtyDaysAgo = new Date(Date.now() - 60 * 24 * 60 * 60 * 1000);

const [owner, repo] = process.env.GITHUB_REPOSITORY.split('/');
console.log(`Owner: ${owner}, Repo: ${repo}`);

async function fetchIssues(page = 1) {
console.log(`Fetching issues for ${owner}/${repo}, page ${page}`);
return await github.rest.issues.listForRepo({
Expand All @@ -36,23 +36,23 @@ jobs:
page: page
});
}

async function processIssues() {
console.log('Starting to process issues');
console.log(`Repository: ${owner}/${repo}`);

let page = 1;
let hasMoreIssues = true;
while (hasMoreIssues) {
try {
const issues = await fetchIssues(page);
console.log(`Fetched ${issues.data.length} issues on page ${page}`);

if (issues.data.length === 0) {
hasMoreIssues = false;
break;
}

for (const issue of issues.data) {
if (new Date(issue.updated_at) < sixtyDaysAgo) {
try {
Expand Down Expand Up @@ -87,5 +87,5 @@ jobs:
}
console.log('Finished processing issues');
}

await processIssues();
13 changes: 4 additions & 9 deletions .github/workflows/execute-notebook.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ concurrency:
group: execute-notebook-${{ github.ref }}
cancel-in-progress: true


jobs:
run-all-notebooks:
runs-on: 1-gpu-runner
Expand All @@ -42,13 +42,8 @@ jobs:
python -m ipykernel install --user --name python3 --display-name "Python 3"

- name: Execute notebooks
timeout-minutes: 30
run: |
cd docs
for nb in *.ipynb; do
if [ -f "$nb" ]; then
echo "Executing $nb"
jupyter nbconvert --to notebook --execute --inplace "$nb" \
--ExecutePreprocessor.timeout=600 \
--ExecutePreprocessor.kernel_name=python3
fi
done
make clean
make compile
11 changes: 9 additions & 2 deletions .github/workflows/nightly-eval.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,16 @@ jobs:
- name: Install dependencies
run: |
bash scripts/ci_install_dependency.sh
pip install --upgrade "evalplus[vllm] @ git+https://github.com/evalplus/evalplus"

- name: Nightly gsm8k Accuracy
timeout-minutes: 60
- name: Test gsm8k
timeout-minutes: 120
run: |
cd test/srt
python3 test_nightly_gsm8k_eval.py

- name: Test human eval
timeout-minutes: 120
run: |
cd test/srt
python3 test_nightly_human_eval.py
71 changes: 71 additions & 0 deletions .github/workflows/pr-test-rust.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: PR Test (Rust)

on:
push:
branches: [ main ]
paths:
- "rust/**"
pull_request:
branches: [ main ]
paths:
- "rust/**"
workflow_dispatch:

concurrency:
group: pr-test-rust-${{ github.ref }}
cancel-in-progress: true

jobs:
unit-test-rust:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
bash scripts/ci_install_rust.sh

- name: Run fmt
run: |
source "$HOME/.cargo/env"
cd rust/
cargo fmt -- --check

- name: Run test
timeout-minutes: 20
run: |
source "$HOME/.cargo/env"
cd rust/
cargo test

e2e-rust:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install rust dependencies
run: |
bash scripts/ci_install_rust.sh

- name: Build python binding
run: |
source "$HOME/.cargo/env"
cd rust
pip install setuptools-rust wheel build
python3 -m build
pip install dist/*.whl
- name: Run e2e test
run: |
cd rust/py_test
python3 run_suite.py

finish:
needs: [unit-test-rust, e2e-rust]
runs-on: ubuntu-latest
steps:
- name: Finish
run: echo "This is an empty step to ensure that all jobs are completed."
Loading
Loading