You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a problem with the current approach of loading and deploying local ML model for integration tests. Current approach is to deploy one model per test method. At the end of test method execution that model is undeployed.
Such approach leads to multiple deploy/undeploy calls on a single test cluster.
Currently we're using ml-commons to deploy the model. As per ml-commons team the engine they are using (PyTorch) is not optimized for recurring model redeployments.
In environment with limited memory that may lead to high memory consumption, and in such case Native Memory Circuit Breaker in ml-commons will be opened. In such case no new model deployment is possible and CB exception will be returned.
Suggested approach is to change paradigm from model per test case to a shared models for all test suite. This way models can be deployed once during the cluster setup, used by the test and then undeployed in the tear down phase. This seems feasible as models are using in a read-only mode and there is limited number of different local models. Currently there are 3 different models that are used in integ tests (https://github.com/opensearch-project/neural-search/tree/main/src/test/resources/processor):
Ml-commons team has implemented fix that allows memory CB to be disabled - opensearch-project/ml-commons#2469). Fix on neural-search side is to disable the CB for integ and BWC tests
There is a problem with the current approach of loading and deploying local ML model for integration tests. Current approach is to deploy one model per test method. At the end of test method execution that model is undeployed.
Such approach leads to multiple deploy/undeploy calls on a single test cluster.
Currently we're using
ml-commons
to deploy the model. As perml-commons
team the engine they are using (PyTorch) is not optimized for recurring model redeployments.In environment with limited memory that may lead to high memory consumption, and in such case Native Memory Circuit Breaker in
ml-commons
will be opened. In such case no new model deployment is possible and CB exception will be returned.Suggested approach is to change paradigm from model per test case to a shared models for all test suite. This way models can be deployed once during the cluster setup, used by the test and then undeployed in the tear down phase. This seems feasible as models are using in a read-only mode and there is limited number of different local models. Currently there are 3 different models that are used in integ tests (https://github.com/opensearch-project/neural-search/tree/main/src/test/resources/processor):
Ref:
deploy
API causes exception from Memory Circuit Breaker ml-commons#2308neural-search
Optimizing integ tests for less model upload calls #683The text was updated successfully, but these errors were encountered: