Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] bug with GET /your_index/_search and POST _reindex in OS 2.11 #626

Closed
LandryK opened this issue Mar 7, 2024 · 7 comments
Closed
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@LandryK
Copy link

LandryK commented Mar 7, 2024

What is the bug?

When you configure an index with a search pipeline as below the
GET /your_index/_search yields a 500 error - Null Pointer Exception.
It seems like by default the _reindex API uses the GET /your_index/_search as such if you attempt to perform re-index with a source_index that has a index.search.default_pipeline configured you will get a Null Pointer Exception.

POST _reindex
{
   "source":{
      "index":"source-index"
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "your-pipeline"
   }
}`
### How can one reproduce the bug?

1- Create Pipeline:
`PUT /_search/pipeline/test-pipeline
{
  "request_processors": [
    {
      "neural_query_enricher" : {
        "tag": "tag1",
        "description": "your description",
        "default_model_id": "your_model_id"
      }
    }
  ]
}

2- Create index with pipeline and setup default search pipeline

PUT /source-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    },
    "default_pipeline": "test-pipeline",
    "index.search.default_pipeline" : "test-pipeline"
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

3- add documents

PUT /source-index/_doc/1
{
    "text": "The emergence of resistance of bacteria to antibiotics is a common phenomenon. Emergence of resistance often reflects evolutionary processes that take place during antibiotic therapy."
  
}
PUT /source-index/_doc/2
{
  "text": "The successful outcome of antimicrobial therapy with antibacterial compounds depends on several factors. These include host defense mechanisms, the location of infection, and the pharmacokinetic and pharmacodynamic properties of the antibacterial."
}

4- search (This will yield a NPE-500 error)
GET /source-index/_search
This will give NPE due to "index.search.default_pipeline" : "test-pipeline". If you remove this setting in the index setting, the query works just fine.

5- Proceed to Create a destination index

PUT /destination-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    },
    "default_pipeline": "test-pipeline",
    "index.search.default_pipeline" : "test-pipeline"
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

6- Attempt to perform re-indexing (This will give you a NPE - 500 error)

POST _reindex
{
   "source":{
      "index":"source-index"
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "test-pipeline"
   }
}

The above will not work because GET /source-index/_search gives a NPE error due to the "index.search.default_pipeline" : "test-pipeline" as discussed in Step 4.
However if you remove the "index.search.default_pipeline" : "test-pipeline" the index settings, the query works.

7- if you try re-index with below it works

POST _reindex
{
   "source":{
      "index":"source-index",
      "query": {
          "match_all": {}
      }
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "test-pipeline"
   }
}

Bugs:

1- Seems like by default the _reindex API is using GET /source-index/_search instead of GET /your_index/_search{"query":{"match_all":{}}} and since the former throws a Null Pointer Error, the _reindex also throws the same as it is unable to get the list of documents in source index.

2- GET /source-index/_search does not work if the "index.search.default_pipeline" : "test-pipeline" is present in index setting

OpenSearch Version

OS 2.11

@LandryK LandryK added bug Something isn't working untriaged labels Mar 7, 2024
@navneet1v
Copy link
Collaborator

@vibrantvarun can you take a look into this issue. Seems like some issue with NeuralQueryEnricher processor

@navneet1v navneet1v moved this to Backlog (Hot) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Backlog (Hot) to 2.14.0 in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin removed the untriaged label Apr 1, 2024
@LandryK
Copy link
Author

LandryK commented Apr 15, 2024

@vamshin @vibrantvarun Any updates on this?
Thanks

@vamshin vamshin moved this from 2.14.0 to 2.16.0 in Vector Search RoadMap Jul 2, 2024
@naveentatikonda naveentatikonda moved this from 2.16.0 to Backlog (Hot) in Vector Search RoadMap Sep 18, 2024
@naveentatikonda naveentatikonda added the good first issue Good for newcomers label Sep 18, 2024
@jmazanec15 jmazanec15 assigned jmazanec15 and unassigned vibrantvarun Oct 2, 2024
@owaiskazi19
Copy link
Member

owaiskazi19 commented Nov 13, 2024

I tested it out. There's no null check below in 2.11

QueryBuilder queryBuilder = searchRequest.source().query();

causing the NPE. It was handled in #615. Should resolved the issue. @martin-gaievski @vibrantvarun can you verify and we can close this one out?

@martin-gaievski
Copy link
Member

Good catch, that PR should fix the issue. I'll run the test and update here whether or not we're good to close the issue.

@owaiskazi19
Copy link
Member

@martin-gaievski did you get a chance to verify this?

@martin-gaievski
Copy link
Member

I think info about ingest pipeline is missing in provided steps. Document ingestion will not work without that pipeline because it's needed to create embeddings.

My understanding is that we need to add following details to provided steps:

create ingest pipeline step:

PUT {{base_url}}/_ingest/pipeline/ingest-pipeline
{
    "description": "An NLP ingest pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "<model_id>",
                "field_map": {
                    "text": "passage_embedding"
                }
            }
        }
    ]
}

then put it as default pipeline for both indexes, e.g. This is needed because default_pipeline is of type ingest, not search.

PUT {{base_url}}/source-index
{
    "settings": {
        "index": {
            "knn": true,
            "knn.algo_param.ef_search": 100
        },
        "default_pipeline": "ingest-pipeline",
        "index.search.default_pipeline": "test-pipeline"
    },
    "mappings": {
        "properties": {
            "text": {
                "type": "text"
            },
            "passage_embedding": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "engine": "lucene",
                    "space_type": "l2",
                    "name": "hnsw"
                }
            }
        }
    }
}

similarly it should ingest pipeline in the reindex request:

POST {{base_url}}/_reindex
{
   "source":{
      "index":"source-index"
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "ingest-pipeline"
   }
}

If I do all this, then reindex operation works for me. Tested on latest 2.x (2.18)

@LandryK, please review my assessment and correct me if I'm wrong. Based on @owaiskazi19's finding, I believe this issue stems from either an incomplete configuration or a fixed scenario.

@heemin32
Copy link
Collaborator

Seems the issue is fixed on the latest version. Please reopen it if the issue still exist.

@github-project-automation github-project-automation bot moved this from Backlog (Hot) to ✅ Done in Vector Search RoadMap Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: Done
Development

No branches or pull requests

9 participants