Offline LLM Engine Benchmark Throughput #1968

zolinthecow · 2024-11-09T02:15:41Z

Motivation

#1865
Add throughput benchmark for engine.generate

Modifications

Added ability to specify an engine instead of an API url in the benchmarks

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zolinthecow · 2024-11-09T02:16:18Z

@ByronHsu do you want me to add this to CI?

merrymercy

This PR tries to reuse the script bench_serving.py. However, I think a better approach is to use a standalone script.

Reason:

bench_serving.py is for online serving, but the most common use case of the Engine API is for offline use cases. We should benchmark the non-async version of the Engine to see what the maximum throughput we can get without all of the streaming/asyncio overhead

We want to pass in many other arguments to the server. The new script should be similar to bench_latency.py, which takes in the full ServerArgs as arguments.

sglang/python/sglang/bench_latency.py

Lines 536 to 541 in 760552e

    
           parser = argparse.ArgumentParser() 
        
           ServerArgs.add_cli_args(parser) 
        
           BenchArgs.add_cli_args(parser) 
        
           args = parser.parse_args() 
        
           server_args = ServerArgs.from_cli_args(args) 
        
           bench_args = BenchArgs.from_cli_args(args)

Can you try to write a standalone script bench_offline_throughput.py that takes the same arguments as bench_latency.py?

zolinthecow · 2024-11-09T22:03:58Z

will do

zolinthecow · 2024-11-11T23:12:21Z

@merrymercy updated script, what do you think

merrymercy

This looks better! Sorry for the back-and-forth, but I think we should support loading real datasets and support better random dataset generation, similar to the one in bench_serinvg.py.

The bench_latency.py uses a simple way to generate synthetic data because it does not support continuous batching or inputs with variable lengths. The engine supports everything so we can use more realistic data.

python/sglang/bench_offline_throughput.py

merrymercy · 2024-11-11T23:59:37Z

Also, please add a unit test here https://github.com/sgl-project/sglang/blob/main/test/srt/test_srt_engine.py to run this benchmark for 10 random prompts.

zolinthecow · 2024-11-12T02:36:44Z

@merrymercy made the changes

ByronHsu · 2024-11-12T06:24:20Z

Can we have an option for runtime backend? So we can easily do benchmark between runtime.generate and engine.generate. Related to #1872

zolinthecow · 2024-11-13T22:18:08Z

sure. i would suggest that we keep input + output since it's a pretty standard way to measure throughput (total system throughput) as far as i know. definitely will put both though. maybe add input + output to bench_serving as well?

python/sglang/bench_offline_throughput.py

merrymercy · 2024-11-14T07:53:38Z

The script now looks good. @ByronHsu can you verify the output format? @zolinthecow Can you resolve the remaining comments? Then we can merge this soon!

python/sglang/bench_offline_throughput.py

merrymercy · 2024-11-15T01:34:08Z

LGTM. @ByronHsu you can do the final merge.

ByronHsu · 2024-11-15T05:39:20Z

Slightly refactored the code and attempted to fix the test. Can merge if the test passes

ByronHsu · 2024-11-15T05:59:46Z

Thanks for the contribution!! @zolinthecow

merrymercy · 2024-11-15T12:56:53Z

@zolinthecow great work!

zolinthecow added 3 commits November 9, 2024 01:32

add offline engine bench

807a3f0

llm_engine -> engine

e3ec623

add to unit test bench

8b1232b

zolinthecow requested review from merrymercy, Ying1123 and zhyncs as code owners November 9, 2024 02:15

merrymercy requested changes Nov 9, 2024

View reviewed changes

merrymercy added the await-response label Nov 9, 2024

merrymercy assigned merrymercy and ByronHsu Nov 9, 2024

zolinthecow and others added 7 commits November 11, 2024 22:34

first draft bench offline throughput

e6293a8

script works

5564a96

reset bench serving stuff

0078bc3

merge

9f6c31a

most recent commit?

3158414

restore test utils

550ec14

Merge branch 'main' into benchmark-script

a6b183e

lint

c1c6226

merrymercy requested changes Nov 11, 2024

View reviewed changes

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

zolinthecow added 3 commits November 12, 2024 02:27

use sharegpt from bench_serving

1895c79

add unit test

3c8faf9

lint

170c83f

zolinthecow added 2 commits November 12, 2024 21:02

add support for runtime backend + dataclass generic args

696dd95

push not being processed?

21b6ed5

zolinthecow added 2 commits November 14, 2024 03:22

format benchmark + add diff metrics

41aad44

lint

fa76ac9

zolinthecow requested review from hnyls2002 and ispobock as code owners November 14, 2024 03:22

zolinthecow added 2 commits November 14, 2024 04:22

fix script

cc2a5c5

fix test

e1045e4

merrymercy requested changes Nov 14, 2024

View reviewed changes

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

merrymercy closed this Nov 14, 2024

merrymercy reopened this Nov 14, 2024

merrymercy requested changes Nov 14, 2024

View reviewed changes

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved

zolinthecow added 2 commits November 14, 2024 07:57

fix

4a322a3

remove useless try except

ef4f278

This was referenced Nov 14, 2024

[Bug] Offline engine performance is not better than local server when running batch #1872

Closed

[Feature] Create a benchmark script for offline inference #1865

Closed

merrymercy added the high priority label Nov 14, 2024

merrymercy approved these changes Nov 15, 2024

View reviewed changes

merrymercy removed the await-response label Nov 15, 2024

fix test and move logging

df9da2e

Merge branch 'main' into benchmark-script

d5fa88c

ByronHsu approved these changes Nov 15, 2024

View reviewed changes

ByronHsu merged commit f6dd648 into sgl-project:main Nov 15, 2024
13 checks passed

zolinthecow deleted the benchmark-script branch November 15, 2024 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline LLM Engine Benchmark Throughput #1968

Offline LLM Engine Benchmark Throughput #1968

zolinthecow commented Nov 9, 2024

zolinthecow commented Nov 9, 2024

merrymercy left a comment •

edited

Loading

zolinthecow commented Nov 9, 2024

zolinthecow commented Nov 11, 2024 •

edited

Loading

merrymercy left a comment

merrymercy commented Nov 11, 2024

zolinthecow commented Nov 12, 2024

ByronHsu commented Nov 12, 2024

zolinthecow commented Nov 13, 2024

merrymercy commented Nov 14, 2024

merrymercy commented Nov 15, 2024

ByronHsu commented Nov 15, 2024

ByronHsu commented Nov 15, 2024

merrymercy commented Nov 15, 2024

	parser = argparse.ArgumentParser()
	ServerArgs.add_cli_args(parser)
	BenchArgs.add_cli_args(parser)
	args = parser.parse_args()
	server_args = ServerArgs.from_cli_args(args)
	bench_args = BenchArgs.from_cli_args(args)

Offline LLM Engine Benchmark Throughput #1968

Offline LLM Engine Benchmark Throughput #1968

Conversation

zolinthecow commented Nov 9, 2024

Motivation

Modifications

Checklist

zolinthecow commented Nov 9, 2024

merrymercy left a comment • edited Loading

Choose a reason for hiding this comment

zolinthecow commented Nov 9, 2024

zolinthecow commented Nov 11, 2024 • edited Loading

merrymercy left a comment

Choose a reason for hiding this comment

merrymercy commented Nov 11, 2024

zolinthecow commented Nov 12, 2024

ByronHsu commented Nov 12, 2024

zolinthecow commented Nov 13, 2024

merrymercy commented Nov 14, 2024

merrymercy commented Nov 15, 2024

ByronHsu commented Nov 15, 2024

ByronHsu commented Nov 15, 2024

merrymercy commented Nov 15, 2024

merrymercy left a comment •

edited

Loading

zolinthecow commented Nov 11, 2024 •

edited

Loading