You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? #2970. We keep getting bugs reports for models stuck in deploying/partially_deployed status, etc.
For remote models, deploying them across the entire cluster and running regular sync jobs to update deployment status on each node incurs significant overhead. This approach leads to various issues in edge cases, such as during version upgrades, cluster scaling, or node changes
What solution would you like?
Remote model deployment is quick (approximately 10 ms) and does not require pre-deployment or maintaining a domain-level deployment status before usage.
Remote models should be deployed locally on a specific node only upon receiving a prediction request and cached with a TTL. Additionally, the "Model Status" field for remote models should be removed or hidden, as there is no need to synchronize remote models across the entire cluster's memory.
When customers send a high volume of requests covering all nodes, each node caches the remote model during the first prediction request, minimizing latencies. For smaller traffic, only a subset of nodes may receive requests and cache the model. This is reasonable, as the added latency for cold invocations is acceptable.
What alternatives have you considered?
Auto-deploy for remote model has already mitigated a lot of model deployment issues, but not all edge cases are covered. This can be seen as an additional effort to further enhance our model deployment strategy.
Zhangxunmt
changed the title
[FEATURE] Remove unnecessary Deployment for Remote Models
[FEATURE] Unsynchronize the deployment for ml-commons remote models in a OpenSearch domain
Nov 15, 2024
Is your feature request related to a problem?
#2970. We keep getting bugs reports for models stuck in deploying/partially_deployed status, etc.
For remote models, deploying them across the entire cluster and running regular sync jobs to update deployment status on each node incurs significant overhead. This approach leads to various issues in edge cases, such as during version upgrades, cluster scaling, or node changes
What solution would you like?
Remote model deployment is quick (approximately 10 ms) and does not require pre-deployment or maintaining a domain-level deployment status before usage.
Remote models should be deployed locally on a specific node only upon receiving a prediction request and cached with a TTL. Additionally, the "Model Status" field for remote models should be removed or hidden, as there is no need to synchronize remote models across the entire cluster's memory.
When customers send a high volume of requests covering all nodes, each node caches the remote model during the first prediction request, minimizing latencies. For smaller traffic, only a subset of nodes may receive requests and cache the model. This is reasonable, as the added latency for cold invocations is acceptable.
What alternatives have you considered?
Auto-deploy for remote model has already mitigated a lot of model deployment issues, but not all edge cases are covered. This can be seen as an additional effort to further enhance our model deployment strategy.
Do you have any additional context?
#2050
#2376
#2382
The text was updated successfully, but these errors were encountered: