feature: hpa for jointinference #465

tangming1996 · 2025-02-14T02:57:35Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
In the scenario of large - model inference, the resource requirements of inference tasks usually increase significantly with the increase in the number of accesses. In the current cloud - edge joint - inference architecture, the fixed single - instance configuration is difficult to effectively cope with such fluctuations, resulting in insufficient resource utilization or performance bottlenecks. By configuring HPA (Horizontal Pod Autoscaler) in the deployment, the number of inference instances can be automatically adjusted according to the real - time number of accesses, and resources can be dynamically expanded or reduced. This mechanism can increase instances during high - load periods and reduce instances during low - load periods, thereby improving concurrent processing capabilities, maximizing the optimization of resource utilization, and ensuring the high efficiency and scalability of the inference service.
Which issue(s) this PR fixes:

Fixes #

Signed-off-by: ming.tang <[email protected]>

hsj576 · 2025-02-20T09:52:33Z

/lgtm

MooreZheng

This HPA pr is necessary when the number of inference requests fluctuates. It would be a good idea to have a new example for HPA, especially joint inference using large model @hsj576

MooreZheng · 2025-02-20T11:13:57Z

/lgtm

MooreZheng · 2025-02-20T11:14:14Z

/approve

kubeedge-bot · 2025-02-20T11:14:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MooreZheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [MooreZheng]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubeedge-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 14, 2025

kubeedge-bot requested review from jaypume and MooreZheng February 14, 2025 02:57

kubeedge-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 14, 2025

feature: hpa for jointinference

27cc953

Signed-off-by: ming.tang <[email protected]>

tangming1996 force-pushed the feature/hpa-implement branch from 3295ce9 to 27cc953 Compare February 14, 2025 03:00

kubeedge-bot assigned hsj576 Feb 20, 2025

kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2025

MooreZheng reviewed Feb 20, 2025

View reviewed changes

kubeedge-bot assigned MooreZheng Feb 20, 2025

kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2025

kubeedge-bot merged commit d234200 into kubeedge:main Feb 20, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: hpa for jointinference #465

feature: hpa for jointinference #465

tangming1996 commented Feb 14, 2025

hsj576 commented Feb 20, 2025

MooreZheng left a comment

MooreZheng commented Feb 20, 2025

MooreZheng commented Feb 20, 2025

kubeedge-bot commented Feb 20, 2025

feature: hpa for jointinference #465

feature: hpa for jointinference #465

Conversation

tangming1996 commented Feb 14, 2025

hsj576 commented Feb 20, 2025

MooreZheng left a comment

Choose a reason for hiding this comment

MooreZheng commented Feb 20, 2025

MooreZheng commented Feb 20, 2025

kubeedge-bot commented Feb 20, 2025