Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

Closed
bzhangam opened this issue Jan 1, 2025 · 6 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@bzhangam
Copy link

bzhangam commented Jan 1, 2025

What is the bug?
On 12/31/2024 we start seeing an error related to Neural Sparse Query.

We have got the below error logs:

...
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.ml.common.output.model.ModelTensorOutput.getMlModelOutputs()\" because \"modelTensorOutput\" is null"}],"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.ml.common.output.model.ModelTensorOutput.getMlModelOutputs()\" because \"modelTensorOutput\" is null"},"status":500}
Suite: Test class org.opensearch.neuralsearch.bwc.NeuralSparseSearchIT
        at __randomizedtesting.SeedInfo.seed([78661D5556610BA0:BA5AFDCC4BFDB914]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.neuralsearch.BaseNeuralSearchIT.addSparseEncodingDoc(BaseNeuralSearchIT.java:767)
        at app//org.opensearch.neuralsearch.bwc.NeuralSparseSearchIT.testSparseEncodingProcessor_E2EFlow(NeuralSparseSearchIT.java:58)
        
...

Failed to init instance for type MODEL_TENSOR
java.lang.reflect.InvocationTargetException: null
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74) ~[?:?]
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) ~[?:?]
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) ~[?:?]
        at org.opensearch.ml.common.MLCommonsClassLoader.init(MLCommonsClassLoader.java:242) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.MLCommonsClassLoader.initMLInstance(MLCommonsClassLoader.java:206) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.output.MLOutput.fromStream(MLOutput.java:36) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.transport.MLTaskResponse.<init>(MLTaskResponse.java:39) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.transport.MLTaskResponse.fromActionResponse(MLTaskResponse.java:55) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.client.MachineLearningNodeClient.lambda$getMlPredictionTaskResponseActionListener$22(MachineLearningNodeClient.java:419) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.client.MachineLearningNodeClient.lambda$wrapActionListener$24(MachineLearningNodeClient.java:440) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:115) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:109) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$5.onResponse(ActionListener.java:268) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$5.onResponse(ActionListener.java:268) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:443) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:355) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executePredictionByInputDataType$10(MLPredictTaskRunner.java:300) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
...   

The issue happens when Neural plugin tries to invoke the ML predict API to inference the data. (code)

We have raised an empty change PR to confirm the issue should not be caused by the change in the Neural Search plugin. Since the last PR merged into the Neural Search plugin on 12/27/2024 did pass all the tests.

How can one reproduce the bug?
It’s a consistent issue that can happen when we try to use a sparse model to inference the data in Neural plugin. (we can reproduce it locally by following this doc)

What is the expected behavior?

  1. NeuralSearch Plugin should be able to use Sparse model to inference the data successfully.
  2. The empty change PR should pass all the tests.

What is your host/environment?

  • OS: MAC
  • Version: 15.2
  • Plugins: ML-commons plugin

Do you have any screenshots?
N/A

Do you have any additional context?
If we directly invokes ML predict API to use a sparse model to inference the data it can work. (we can follow this doc to do that)

@bzhangam bzhangam added bug Something isn't working untriaged labels Jan 1, 2025
@dhrubo-os
Copy link
Collaborator

@zhichao-aws @xinyual could you guys please take a look?

@zhichao-aws
Copy link
Member

Hi @bzhangam @dhrubo-os I've taken a look at the issue and here are some key findings:

  1. When invoke MachineLearningNodeClient.predict at neural-search, the response is null. And this is the direct cause for the error message.
    code

  2. I looked into ml-commons code, and found the response become null at this line. After MLTaskResponse.fromActionResponse.

The bug should be caused by ml-commons side change. Will you fix it? @bzhangam

@bzhangam
Copy link
Author

bzhangam commented Jan 2, 2025

Hi @bzhangam @dhrubo-os I've taken a look at the issue and here are some key findings:

  1. When invoke MachineLearningNodeClient.predict at neural-search, the response is null. And this is the direct cause for the error message.
    code
  2. I looked into ml-commons code, and found the response become null at this line. After MLTaskResponse.fromActionResponse.

The bug should be caused by ml-commons side change. Will you fix it? @bzhangam

Hi @zhichao-aws. Thanks for your investigation. I'm not working on the ml-commons plugin. @dhrubo-os Do you know who can work on it?

@pyek-bot
Copy link
Contributor

pyek-bot commented Jan 2, 2025

I'm looking into this, will update once i have something

@pyek-bot
Copy link
Contributor

pyek-bot commented Jan 3, 2025

Resolved with: opensearch-project/neural-search#1055

Issue was caused due to conflicts in jackson dependency versions across ml-commons, core and neural-search. Updated to use the version coming from core for consistency.

@pyek-bot
Copy link
Contributor

pyek-bot commented Jan 3, 2025

@dhrubo-os Let's close this issue, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants