Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline LLM Engine Benchmark Throughput #1968

Merged
merged 35 commits into from
Nov 15, 2024

Conversation

zolinthecow
Copy link
Contributor

Motivation

#1865
Add throughput benchmark for engine.generate

Modifications

Added ability to specify an engine instead of an API url in the benchmarks

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zolinthecow
Copy link
Contributor Author

@ByronHsu do you want me to add this to CI?

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to reuse the script bench_serving.py. However, I think a better approach is to use a standalone script.

Reason:

  • bench_serving.py is for online serving, but the most common use case of the Engine API is for offline use cases. We should benchmark the non-async version of the Engine to see what the maximum throughput we can get without all of the streaming/asyncio overhead
  • We want to pass in many other arguments to the server. The new script should be similar to bench_latency.py, which takes in the full ServerArgs as arguments.
    parser = argparse.ArgumentParser()
    ServerArgs.add_cli_args(parser)
    BenchArgs.add_cli_args(parser)
    args = parser.parse_args()
    server_args = ServerArgs.from_cli_args(args)
    bench_args = BenchArgs.from_cli_args(args)

Can you try to write a standalone script bench_offline_throughput.py that takes the same arguments as bench_latency.py?

@zolinthecow
Copy link
Contributor Author

will do

@zolinthecow
Copy link
Contributor Author

zolinthecow commented Nov 11, 2024

@merrymercy updated script, what do you think

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better! Sorry for the back-and-forth, but I think we should support loading real datasets and support better random dataset generation, similar to the one in bench_serinvg.py.

The bench_latency.py uses a simple way to generate synthetic data because it does not support continuous batching or inputs with variable lengths. The engine supports everything so we can use more realistic data.

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved
python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved
python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

Also, please add a unit test here https://github.com/sgl-project/sglang/blob/main/test/srt/test_srt_engine.py to run this benchmark for 10 random prompts.

@zolinthecow
Copy link
Contributor Author

@merrymercy made the changes

@ByronHsu
Copy link
Collaborator

Can we have an option for runtime backend? So we can easily do benchmark between runtime.generate and engine.generate. Related to #1872

@zolinthecow
Copy link
Contributor Author

sure. i would suggest that we keep input + output since it's a pretty standard way to measure throughput (total system throughput) as far as i know. definitely will put both though. maybe add input + output to bench_serving as well?

python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved
python/sglang/bench_offline_throughput.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

The script now looks good. @ByronHsu can you verify the output format? @zolinthecow Can you resolve the remaining comments? Then we can merge this soon!

@merrymercy merrymercy closed this Nov 14, 2024
@merrymercy merrymercy reopened this Nov 14, 2024
@merrymercy
Copy link
Contributor

LGTM. @ByronHsu you can do the final merge.

@ByronHsu
Copy link
Collaborator

Slightly refactored the code and attempted to fix the test. Can merge if the test passes

@ByronHsu ByronHsu merged commit f6dd648 into sgl-project:main Nov 15, 2024
13 checks passed
@ByronHsu
Copy link
Collaborator

Thanks for the contribution!! @zolinthecow

@zolinthecow zolinthecow deleted the benchmark-script branch November 15, 2024 07:10
@merrymercy
Copy link
Contributor

@zolinthecow great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants