Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请求增加优先级参数,优先调度高优先级的请求 #2669

Closed
781574155 opened this issue Dec 13, 2024 · 6 comments
Closed

请求增加优先级参数,优先调度高优先级的请求 #2669

781574155 opened this issue Dec 13, 2024 · 6 comments

Comments

@781574155
Copy link

Feature request / 功能建议

如题。我们在实际生产环境中,需要长时间跑一些文件处理任务(比如写摘要,且对于单个大文件又要拆分为很多小的请求)。与此同时,用户可能有大模型聊天的需求。如果没有优先级,会导致聊天请求要排队等待前面的文件处理任务执行完,才慢慢调度到聊天的请求。这显然是不能接受的。现在的情况是,我们一有文件在处理(比如几百个),就(基本)无法聊天了

Motivation / 动机

我希望咱们能扩展openai的参数,增加一个priority的参数。我不希望是增加新的接口。我希望能够保持我的程序能随时切换到任何支持openai接口的大模型。

Your contribution / 您的贡献

@XprobeBot XprobeBot added this to the v1.x milestone Dec 13, 2024
@781574155
Copy link
Author

vllm是支持优先级的:vllm-project/vllm#8850

@qinxuye
Copy link
Contributor

qinxuye commented Dec 16, 2024

可以尝试在 generate_config 里支持 priority。

def _sanitize_generate_config(
generate_config: Optional[Dict] = None,
) -> VLLMGenerateConfig:
if not generate_config:
generate_config = {}
sanitized = VLLMGenerateConfig()
response_format = generate_config.pop("response_format", None)
guided_decoding_backend = generate_config.get("guided_decoding_backend", None)
guided_json_object = None
guided_json = None
if response_format is not None:
if response_format.get("type") == "json_object":
guided_json_object = True
elif response_format.get("type") == "json_schema":
json_schema = response_format.get("json_schema")
assert json_schema is not None
guided_json = json_schema.get("json_schema")
if guided_decoding_backend is None:
guided_decoding_backend = "outlines"
sanitized.setdefault("lora_name", generate_config.get("lora_name", None))
sanitized.setdefault("n", generate_config.get("n", 1))
sanitized.setdefault("best_of", generate_config.get("best_of", None))
sanitized.setdefault(
"presence_penalty", generate_config.get("presence_penalty", 0.0)
)
sanitized.setdefault(
"frequency_penalty", generate_config.get("frequency_penalty", 0.0)
)
sanitized.setdefault("temperature", generate_config.get("temperature", 1.0))
sanitized.setdefault("top_p", generate_config.get("top_p", 1.0))
sanitized.setdefault("top_k", generate_config.get("top_k", -1))
sanitized.setdefault("max_tokens", generate_config.get("max_tokens", 1024))
sanitized.setdefault("stop", generate_config.get("stop", None))
sanitized.setdefault(
"stop_token_ids", generate_config.get("stop_token_ids", None)
)
sanitized.setdefault("stream", generate_config.get("stream", False))
sanitized.setdefault(
"stream_options", generate_config.get("stream_options", None)
)
sanitized.setdefault(
"skip_special_tokens", generate_config.get("skip_special_tokens", True)
)
sanitized.setdefault(
"guided_json", generate_config.get("guided_json", guided_json)
)
sanitized.setdefault("guided_regex", generate_config.get("guided_regex", None))
sanitized.setdefault(
"guided_choice", generate_config.get("guided_choice", None)
)
sanitized.setdefault(
"guided_grammar", generate_config.get("guided_grammar", None)
)
sanitized.setdefault(
"guided_whitespace_pattern",
generate_config.get("guided_whitespace_pattern", None),
)
sanitized.setdefault(
"guided_json_object",
generate_config.get("guided_json_object", guided_json_object),
)
sanitized.setdefault(
"guided_decoding_backend",
generate_config.get("guided_decoding_backend", guided_decoding_backend),
)
return sanitized

欢迎提交 PR。

@781574155
Copy link
Author

@qinxuye 大哥,这个需求应该很多人都需要。因为私有化部署,往往资源是有限的。不可能部署两套,一套做批处理任务,一套做实时任务。你就动动你的金手指,分分钟把它写了算了!我也想贡献个PR,但是我搞不懂啊!

@qinxuye
Copy link
Contributor

qinxuye commented Dec 16, 2024

这个是个企业级特性,我们只会在企业版上支持类似能力,不过开源欢迎社区贡献。

Copy link

This issue is stale because it has been open for 7 days with no activity.

Copy link

This issue was closed because it has been inactive for 5 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants