[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

bzhangam · 2025-01-01T01:12:22Z

What is the bug?
On 12/31/2024 we start seeing an error related to Neural Sparse Query.

We have got the below error logs:

...
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.ml.common.output.model.ModelTensorOutput.getMlModelOutputs()\" because \"modelTensorOutput\" is null"}],"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.ml.common.output.model.ModelTensorOutput.getMlModelOutputs()\" because \"modelTensorOutput\" is null"},"status":500}
Suite: Test class org.opensearch.neuralsearch.bwc.NeuralSparseSearchIT
        at __randomizedtesting.SeedInfo.seed([78661D5556610BA0:BA5AFDCC4BFDB914]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.neuralsearch.BaseNeuralSearchIT.addSparseEncodingDoc(BaseNeuralSearchIT.java:767)
        at app//org.opensearch.neuralsearch.bwc.NeuralSparseSearchIT.testSparseEncodingProcessor_E2EFlow(NeuralSparseSearchIT.java:58)
        
...

Failed to init instance for type MODEL_TENSOR
java.lang.reflect.InvocationTargetException: null
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74) ~[?:?]
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) ~[?:?]
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) ~[?:?]
        at org.opensearch.ml.common.MLCommonsClassLoader.init(MLCommonsClassLoader.java:242) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.MLCommonsClassLoader.initMLInstance(MLCommonsClassLoader.java:206) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.output.MLOutput.fromStream(MLOutput.java:36) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.transport.MLTaskResponse.<init>(MLTaskResponse.java:39) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.common.transport.MLTaskResponse.fromActionResponse(MLTaskResponse.java:55) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.client.MachineLearningNodeClient.lambda$getMlPredictionTaskResponseActionListener$22(MachineLearningNodeClient.java:419) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.ml.client.MachineLearningNodeClient.lambda$wrapActionListener$24(MachineLearningNodeClient.java:440) [opensearch-ml-client-3.0.0.0-SNAPSHOT.jar:?]
        at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:115) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.TransportAction$1.onResponse(TransportAction.java:109) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$5.onResponse(ActionListener.java:268) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.action.ActionListener$5.onResponse(ActionListener.java:268) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:443) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:355) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executePredictionByInputDataType$10(MLPredictTaskRunner.java:300) [opensearch-ml-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
...

The issue happens when Neural plugin tries to invoke the ML predict API to inference the data. (code)

We have raised an empty change PR to confirm the issue should not be caused by the change in the Neural Search plugin. Since the last PR merged into the Neural Search plugin on 12/27/2024 did pass all the tests.

How can one reproduce the bug?
It’s a consistent issue that can happen when we try to use a sparse model to inference the data in Neural plugin. (we can reproduce it locally by following this doc)

What is the expected behavior?

NeuralSearch Plugin should be able to use Sparse model to inference the data successfully.
The empty change PR should pass all the tests.

What is your host/environment?

OS: MAC
Version: 15.2
Plugins: ML-commons plugin

Do you have any screenshots?
N/A

Do you have any additional context?
If we directly invokes ML predict API to use a sparse model to inference the data it can work. (we can follow this doc to do that)

The text was updated successfully, but these errors were encountered:

dhrubo-os · 2025-01-01T01:19:42Z

@zhichao-aws @xinyual could you guys please take a look?

zhichao-aws · 2025-01-02T07:29:27Z

Hi @bzhangam @dhrubo-os I've taken a look at the issue and here are some key findings:

When invoke MachineLearningNodeClient.predict at neural-search, the response is null. And this is the direct cause for the error message.
code
I looked into ml-commons code, and found the response become null at this line. After MLTaskResponse.fromActionResponse.

The bug should be caused by ml-commons side change. Will you fix it? @bzhangam

bzhangam · 2025-01-02T17:55:31Z

Hi @bzhangam @dhrubo-os I've taken a look at the issue and here are some key findings:

When invoke MachineLearningNodeClient.predict at neural-search, the response is null. And this is the direct cause for the error message.
code

I looked into ml-commons code, and found the response become null at this line. After MLTaskResponse.fromActionResponse.

The bug should be caused by ml-commons side change. Will you fix it? @bzhangam

Hi @zhichao-aws. Thanks for your investigation. I'm not working on the ml-commons plugin. @dhrubo-os Do you know who can work on it?

pyek-bot · 2025-01-02T23:39:10Z

I'm looking into this, will update once i have something

pyek-bot · 2025-01-03T20:46:05Z

Resolved with: opensearch-project/neural-search#1055

Issue was caused due to conflicts in jackson dependency versions across ml-commons, core and neural-search. Updated to use the version coming from core for consistency.

pyek-bot · 2025-01-03T20:49:42Z

@dhrubo-os Let's close this issue, thank you!

bzhangam added bug Something isn't working untriaged labels Jan 1, 2025

vibrantvarun mentioned this issue Jan 2, 2025

Pagination in Hybrid query opensearch-project/neural-search#1048

Merged

5 tasks

dhrubo-os assigned pyek-bot Jan 2, 2025

dhrubo-os removed the untriaged label Jan 2, 2025

junqiu-lei mentioned this issue Jan 3, 2025

Optimize ML inference connection retry logic opensearch-project/neural-search#1054

Merged

5 tasks

martin-gaievski mentioned this issue Jan 3, 2025

Added runtime dependencies for jackson lib opensearch-project/neural-search#1055

Merged

1 task

dhrubo-os closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

bzhangam commented Jan 1, 2025

dhrubo-os commented Jan 1, 2025

zhichao-aws commented Jan 2, 2025

bzhangam commented Jan 2, 2025

pyek-bot commented Jan 2, 2025

pyek-bot commented Jan 3, 2025

pyek-bot commented Jan 3, 2025

[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

[BUG] NeuralSearch Plugin Cannot Invoke Predict API for Sparse Model #3321

Comments

bzhangam commented Jan 1, 2025

dhrubo-os commented Jan 1, 2025

zhichao-aws commented Jan 2, 2025

bzhangam commented Jan 2, 2025

pyek-bot commented Jan 2, 2025

pyek-bot commented Jan 3, 2025

pyek-bot commented Jan 3, 2025