Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serving] Support batched prefill and benchmark #1250

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

MasterJH5574
Copy link
Member

This PR supports the current serving framework with batched prefill, which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time dataset as input.

This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
@junrushao junrushao merged commit 06b6c1f into mlc-ai:serving Nov 13, 2023
MasterJH5574 added a commit that referenced this pull request Nov 16, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Nov 20, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Nov 22, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Nov 29, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Dec 8, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
junrushao pushed a commit to junrushao/mlc-llm that referenced this pull request Dec 12, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Dec 12, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit to MasterJH5574/mlc-llm that referenced this pull request Dec 25, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Dec 25, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Dec 30, 2023
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Jan 1, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit to MasterJH5574/mlc-llm that referenced this pull request Jan 2, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit to MasterJH5574/mlc-llm that referenced this pull request Jan 4, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit to MasterJH5574/mlc-llm that referenced this pull request Jan 10, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit that referenced this pull request Jan 10, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
MasterJH5574 added a commit to MasterJH5574/mlc-llm that referenced this pull request Jan 12, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
tqchen pushed a commit that referenced this pull request Jan 12, 2024
This PR supports the current serving framework with batched prefill,
which helps improve the throughput of prefill.

Some data structures are tweaked for less runtime overhead.

This PR also brings the benchmark of serving engine with real-time
dataset as input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants