Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance test in CI #1411

Merged
merged 3 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,29 +88,23 @@ jobs:
pip install -e "python[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall

- name: Benchmark Offline Throughput
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default

- name: Benchmark Offline Throughput (w/o RadixAttention)
- name: Benchmark Single Latency
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache
python3 -m unittest test_bench_latency.TestBenchLatency.test_default

- name: Benchmark Offline Throughput (w/o ChunkedPrefill)
- name: Benchmark Online Latency
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_chunked_prefill
python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default

- name: Benchmark Offline Throughput (w/ Triton)
- name: Benchmark Offline Throughput
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default

performance-test-1-gpu-part-2:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
Expand All @@ -125,17 +119,23 @@ jobs:
pip install -e "python[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall

- name: Benchmark Single Latency
- name: Benchmark Offline Throughput (w/o RadixAttention)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_latency.TestBenchLatency.test_default
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache

- name: Benchmark Online Latency
- name: Benchmark Offline Throughput (w/o ChunkedPrefill)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_chunked_prefill

- name: Benchmark Offline Throughput (w/ Triton)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend

performance-test-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
Expand Down
2 changes: 1 addition & 1 deletion python/sglang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@
- `bench_latency.py`: Benchmark a single static batch.
- `bench_serving.py`: Benchmark online serving with dynamic requests.
- `global_config.py`: The global configs and constants.
- `launch_server.py`: The entry point of launching local server.
- `launch_server.py`: The entry point for launching the local server.
- `utils.py`: Common utilities.
2 changes: 1 addition & 1 deletion test/srt/test_bench_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def test_online_latency_default(self):

if os.getenv("SGLANG_IS_IN_CI", "false") == "true":
assert res["median_e2e_latency_ms"] < 12000
assert res["median_ttft_ms"] < 78
assert res["median_ttft_ms"] < 80
assert res["median_itl_ms"] < 12

def test_moe_offline_throughput_default(self):
Expand Down
Loading