Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotated highlight not maching when search contains both annotation and annotated term #91944

Closed
thelink2012 opened this issue Nov 25, 2022 · 1 comment · Fixed by #92920
Closed
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@thelink2012
Copy link
Contributor

Elasticsearch Version

v8.5.2

Installed Plugins

mapper-annotated-text

Java Version

bundled

OS Version

Linux 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

Hi,

We use the mapper-annotated-text plugin in production to tag named-entities and found an particularly bad behavior during highlighting.

Basically, if we have a text aaaa [bbbb cccc](annotated) dddd, and search for bbbb annotated we'll get the highlight aaaa [bbbb](_hit_term=bbbb) cccc dddd where we'd expect bbbb cccc due to annotated being in the query too.

On the other hand, searching for cccc annotated produces the expected result of aaaa [bbbb cccc](_hit_term=annotated&annotated) dddd.

This happens because the annotation process receives the passage matches sorted by ascending offset order, and when a match overlaps with a previously seen one, it ignores it. So in the first case, it sees bbbb first and when it sees annotated it discards it because it overlaps with bbbb. On the latter case, cccc is the one discarded since it appears later in the list of offsets.

We've prototype a fix in this branch and would be glad to iterate on it and advance with a PR if this is considered as an issue.

Steps to Reproduce

PUT example
{
  "mappings": {
    "properties": {
      "body": {
        "type": "annotated_text"
      }
    }
  }
}
POST example/_doc
{
  "body": "aaaa [bbbb cccc](annotated) dddd"
}
POST example/_search
{
  "query": {
    "match_all": {}
  },
  "highlight": {
    "type": "annotated",
    "order": "score",
    "fields": {
      "body": {
        "highlight_query": {
          "bool": {
            "must": {
              "match": {
                "body": {
                  "query": "bbbb annotated"
                }
              }
            }
          }
        }
      }
    }
  }
}

Logs (if relevant)

No response

@thelink2012 thelink2012 added >bug needs:triage Requires assignment of a team area label labels Nov 25, 2022
@nik9000 nik9000 added the :Search Relevance/Highlighting How a query matched a document label Nov 25, 2022
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Nov 25, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

romseygeek pushed a commit that referenced this issue Jan 23, 2023
…both annotation and annotated term (#92920)

The annotation highlighter can miss annotations if they overlap with another search
term.  This commit re-sorts incoming passages to ensure that all terms are seen
by the highlighter.

Fixes #91944
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
4 participants