Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set ulimit -n 65535 #647

Merged
merged 3 commits into from
Jul 18, 2024
Merged

fix: set ulimit -n 65535 #647

merged 3 commits into from
Jul 18, 2024

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Jul 18, 2024

Motivation

cc @merrymercy @Ying1123 @hnyls2002

When the number of num-prompts is relatively large, for example, 10,000, the default ulimit -n on my development machine is 1024. This causes most requests to fail. This fix ensures that the server automatically sets ulimit upon startup and similarly requires ulimit during client benchmark.

hardware: A100 80G
client: https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py
Note: disable streaming
ref: https://www.ibm.com/support/pages/recommended-values-os-ulimit-feature

# server
python3 -m sglang.launch_server --model /root/Meta-Llama-3-8B-Instruct --trust-remote-code --port 23333 --disable-radix-cache

# client ok
ulimit && python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 23333 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model /root/Meta-Llama-3-8B-Instruct --tokenizer /root/Meta-Llama-3-8B-Instruct --num-prompts 10000 --request-rate 128 --trust-remote-code
# client not
python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 23333 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model /root/Meta-Llama-3-8B-Instruct --tokenizer /root/Meta-Llama-3-8B-Instruct --num-prompts 10000 --request-rate 128 --trust-remote-code
server ok, client ok
============ Serving Benchmark Result ============
Successful requests:                     10000
Benchmark duration (s):                  376.37
Total input tokens:                      2206428
Total generated tokens:                  1870511
Request throughput (req/s):              26.57
Input token throughput (tok/s):          5862.32
Output token throughput (tok/s):         4969.82
---------------Time to First Token----------------
Mean TTFT (ms):                          143441.34
Median TTFT (ms):                        145877.07
P99 TTFT (ms):                           291688.43
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           154188.26
Median ITL (ms):                         156223.75
P99 ITL (ms):                            292479.30
==================================================

server ok, client not
============ Serving Benchmark Result ============
Successful requests:                     3145
Benchmark duration (s):                  122.54
Total input tokens:                      688371
Total generated tokens:                  582171
Request throughput (req/s):              25.67
Input token throughput (tok/s):          5617.53
Output token throughput (tok/s):         4750.87
---------------Time to First Token----------------
Mean TTFT (ms):                          30375.16
Median TTFT (ms):                        24228.75
P99 TTFT (ms):                           100871.26
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           32828.14
Median ITL (ms):                         27608.77
P99 ITL (ms):                            101425.95
==================================================

server not, client not
============ Serving Benchmark Result ============
Successful requests:                     3103
Benchmark duration (s):                  125.92
Total input tokens:                      685120
Total generated tokens:                  577594
Request throughput (req/s):              24.64
Input token throughput (tok/s):          5440.98
Output token throughput (tok/s):         4587.04
---------------Time to First Token----------------
Mean TTFT (ms):                          29561.66
Median TTFT (ms):                        22865.67
P99 TTFT (ms):                           99962.48
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           31773.40
Median ITL (ms):                         26089.01
P99 ITL (ms):                            100067.45
==================================================

server not, client ok
============ Serving Benchmark Result ============
Successful requests:                     3082
Benchmark duration (s):                  119.42
Total input tokens:                      658200
Total generated tokens:                  584397
Request throughput (req/s):              25.81
Input token throughput (tok/s):          5511.84
Output token throughput (tok/s):         4893.81
---------------Time to First Token----------------
Mean TTFT (ms):                          29668.30
Median TTFT (ms):                        22791.54
P99 TTFT (ms):                           98546.82
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           31782.30
Median ITL (ms):                         25786.61
P99 ITL (ms):                            99130.20
==================================================

Modification

as titled

Checklist

  1. Ensure pre-commit or other linting tools are used to fix potential lint issues.
  2. Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
  3. Modify documentation as needed, such as docstrings or example tutorials.

python/sglang/srt/server.py Outdated Show resolved Hide resolved
@zhyncs
Copy link
Member Author

zhyncs commented Jul 18, 2024

I'll merge the main branch.

@zhyncs
Copy link
Member Author

zhyncs commented Jul 18, 2024

done cc @Ying1123

@Ying1123 Ying1123 merged commit b050d92 into sgl-project:main Jul 18, 2024
@zhyncs zhyncs deleted the ulimit branch July 18, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants