-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Priority-based scheduling in async engine #8850
Changes from 5 commits
09712dc
ed32cb9
57c7b4e
feb30b9
c9adc62
2b68c58
ef536f6
2e3e185
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -411,11 +411,15 @@ async def add_request_async( | |
lora_request: Optional[LoRARequest] = None, | ||
trace_headers: Optional[Mapping[str, str]] = None, | ||
prompt_adapter_request: Optional[PromptAdapterRequest] = None, | ||
priority: int = 0, | ||
) -> None: | ||
"""Async version of :meth:`add_request`.""" | ||
if lora_request is not None and not self.lora_config: | ||
raise ValueError(f"Got lora_request {lora_request} but LoRA is " | ||
"not enabled!") | ||
if priority > 0 and not self.scheduler_config.policy == "priority": | ||
raise ValueError(f"Got priority {priority} but " | ||
"Priority scheduling is not enabled.") | ||
if arrival_time is None: | ||
arrival_time = time.time() | ||
|
||
|
@@ -435,6 +439,7 @@ async def add_request_async( | |
lora_request=lora_request, | ||
prompt_adapter_request=prompt_adapter_request, | ||
trace_headers=trace_headers, | ||
priority=priority, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think currently the only way of actually using the feature is to manually change the policy after creating the engine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but shall we allow folks to specify through args? Like other param in the scheduler_config? I feel that's better for wider adoption for this feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I misread your comment. Yes I think this needs to go in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah it's in SchedulerConfig but still needs to be wired in the EngineArgs to be able to enable it externally. Agree that could be done in a separate PR |
||
) | ||
|
||
async def check_health_async(self) -> None: | ||
|
@@ -782,7 +787,8 @@ async def add_request( | |
arrival_time: Optional[float] = None, | ||
lora_request: Optional[LoRARequest] = None, | ||
trace_headers: Optional[Mapping[str, str]] = None, | ||
prompt_adapter_request: Optional[PromptAdapterRequest] = None | ||
prompt_adapter_request: Optional[PromptAdapterRequest] = None, | ||
priority: int = 0, | ||
) -> AsyncGenerator[Union[RequestOutput, EmbeddingRequestOutput], None]: | ||
if not self.is_running: | ||
if self.start_engine_loop: | ||
|
@@ -802,7 +808,9 @@ async def add_request( | |
arrival_time=arrival_time or time.time(), | ||
lora_request=lora_request, | ||
trace_headers=trace_headers, | ||
prompt_adapter_request=prompt_adapter_request) | ||
prompt_adapter_request=prompt_adapter_request, | ||
priority=priority, | ||
) | ||
|
||
return stream.generator() | ||
|
||
|
@@ -813,7 +821,8 @@ async def generate( | |
request_id: str, | ||
lora_request: Optional[LoRARequest] = None, | ||
trace_headers: Optional[Mapping[str, str]] = None, | ||
prompt_adapter_request: Optional[PromptAdapterRequest] = None | ||
prompt_adapter_request: Optional[PromptAdapterRequest] = None, | ||
priority: int = 0, | ||
) -> AsyncGenerator[RequestOutput, None]: | ||
"""Generate outputs for a request. | ||
|
||
|
@@ -831,6 +840,8 @@ async def generate( | |
trace_headers: OpenTelemetry trace headers. | ||
prompt_adapter_request: Prompt Adapter request to use | ||
for generation, if any. | ||
priority: The priority of the request. | ||
Only applicable with priority scheduling. | ||
|
||
Yields: | ||
The output `RequestOutput` objects from the LLMEngine | ||
|
@@ -886,6 +897,7 @@ async def generate( | |
lora_request=lora_request, | ||
trace_headers=trace_headers, | ||
prompt_adapter_request=prompt_adapter_request, | ||
priority=priority, | ||
): | ||
yield LLMEngine.validate_output(output, RequestOutput) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative is also allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schoennenbeck I see that you copied this from LLMEngine ... perhaps you could fix it there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also merge from
main
to fix the test failures now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and done. I also added the error handling to
add_request
inAsyncLLMEngine
since it was previously missing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 The tests still seem to fail for reasons not related to the PR itself.