fix: set ulimit -n 65535 #647

zhyncs · 2024-07-18T08:59:40Z

Motivation

cc @merrymercy @Ying1123 @hnyls2002

When the number of num-prompts is relatively large, for example, 10,000, the default ulimit -n on my development machine is 1024. This causes most requests to fail. This fix ensures that the server automatically sets ulimit upon startup and similarly requires ulimit during client benchmark.

hardware: A100 80G
client: https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py
Note: disable streaming
ref: https://www.ibm.com/support/pages/recommended-values-os-ulimit-feature

# server
python3 -m sglang.launch_server --model /root/Meta-Llama-3-8B-Instruct --trust-remote-code --port 23333 --disable-radix-cache

# client ok
ulimit && python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 23333 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model /root/Meta-Llama-3-8B-Instruct --tokenizer /root/Meta-Llama-3-8B-Instruct --num-prompts 10000 --request-rate 128 --trust-remote-code
# client not
python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 23333 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model /root/Meta-Llama-3-8B-Instruct --tokenizer /root/Meta-Llama-3-8B-Instruct --num-prompts 10000 --request-rate 128 --trust-remote-code

server ok, client ok
============ Serving Benchmark Result ============
Successful requests:                     10000
Benchmark duration (s):                  376.37
Total input tokens:                      2206428
Total generated tokens:                  1870511
Request throughput (req/s):              26.57
Input token throughput (tok/s):          5862.32
Output token throughput (tok/s):         4969.82
---------------Time to First Token----------------
Mean TTFT (ms):                          143441.34
Median TTFT (ms):                        145877.07
P99 TTFT (ms):                           291688.43
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           154188.26
Median ITL (ms):                         156223.75
P99 ITL (ms):                            292479.30
==================================================

server ok, client not
============ Serving Benchmark Result ============
Successful requests:                     3145
Benchmark duration (s):                  122.54
Total input tokens:                      688371
Total generated tokens:                  582171
Request throughput (req/s):              25.67
Input token throughput (tok/s):          5617.53
Output token throughput (tok/s):         4750.87
---------------Time to First Token----------------
Mean TTFT (ms):                          30375.16
Median TTFT (ms):                        24228.75
P99 TTFT (ms):                           100871.26
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           32828.14
Median ITL (ms):                         27608.77
P99 ITL (ms):                            101425.95
==================================================

server not, client not
============ Serving Benchmark Result ============
Successful requests:                     3103
Benchmark duration (s):                  125.92
Total input tokens:                      685120
Total generated tokens:                  577594
Request throughput (req/s):              24.64
Input token throughput (tok/s):          5440.98
Output token throughput (tok/s):         4587.04
---------------Time to First Token----------------
Mean TTFT (ms):                          29561.66
Median TTFT (ms):                        22865.67
P99 TTFT (ms):                           99962.48
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           31773.40
Median ITL (ms):                         26089.01
P99 ITL (ms):                            100067.45
==================================================

server not, client ok
============ Serving Benchmark Result ============
Successful requests:                     3082
Benchmark duration (s):                  119.42
Total input tokens:                      658200
Total generated tokens:                  584397
Request throughput (req/s):              25.81
Input token throughput (tok/s):          5511.84
Output token throughput (tok/s):         4893.81
---------------Time to First Token----------------
Mean TTFT (ms):                          29668.30
Median TTFT (ms):                        22791.54
P99 TTFT (ms):                           98546.82
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           31782.30
Median ITL (ms):                         25786.61
P99 ITL (ms):                            99130.20
==================================================

Modification

as titled

Checklist

Ensure pre-commit or other linting tools are used to fix potential lint issues.
Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
Modify documentation as needed, such as docstrings or example tutorials.

python/sglang/srt/server.py

zhyncs · 2024-07-18T09:20:47Z

I'll merge the main branch.

zhyncs · 2024-07-18T09:31:29Z

done cc @Ying1123

fix: set ulimit -n 65535

6ad957b

zhyncs requested review from Ying1123, merrymercy and hnyls2002 July 18, 2024 09:00

Ying1123 requested changes Jul 18, 2024

View reviewed changes

python/sglang/srt/server.py Outdated Show resolved Hide resolved

fix comment

6f7617c

fix

9976860

Ying1123 merged commit b050d92 into sgl-project:main Jul 18, 2024

zhyncs deleted the ulimit branch July 18, 2024 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: set ulimit -n 65535 #647

fix: set ulimit -n 65535 #647

zhyncs commented Jul 18, 2024

zhyncs commented Jul 18, 2024 •

edited

Loading

zhyncs commented Jul 18, 2024

fix: set ulimit -n 65535 #647

fix: set ulimit -n 65535 #647

Conversation

zhyncs commented Jul 18, 2024

Motivation

Modification

Checklist

zhyncs commented Jul 18, 2024 • edited Loading

zhyncs commented Jul 18, 2024

zhyncs commented Jul 18, 2024 •

edited

Loading