Annotated highlight not maching when search contains both annotation and annotated term #91944
Labels
>bug
:Search Relevance/Highlighting
How a query matched a document
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Elasticsearch Version
v8.5.2
Installed Plugins
mapper-annotated-text
Java Version
bundled
OS Version
Linux 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
Hi,
We use the
mapper-annotated-text
plugin in production to tag named-entities and found an particularly bad behavior during highlighting.Basically, if we have a text
aaaa [bbbb cccc](annotated) dddd
, and search forbbbb annotated
we'll get the highlightaaaa [bbbb](_hit_term=bbbb) cccc dddd
where we'd expectbbbb cccc
due toannotated
being in the query too.On the other hand, searching for
cccc annotated
produces the expected result ofaaaa [bbbb cccc](_hit_term=annotated&annotated) dddd
.This happens because the annotation process receives the passage matches sorted by ascending offset order, and when a match overlaps with a previously seen one, it ignores it. So in the first case, it sees
bbbb
first and when it seesannotated
it discards it because it overlaps withbbbb
. On the latter case,cccc
is the one discarded since it appears later in the list of offsets.We've prototype a fix in this branch and would be glad to iterate on it and advance with a PR if this is considered as an issue.
Steps to Reproduce
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: