fix(search): fix search highlighting of entities containing stop words #3718
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So, earlier today when we were talking about search, I assured Lars that "yes, we can now match all kinds of entities in search queries". I realized shortly thereafter that this wasn't the case.
The problem is that for countries like "Trinidad and Tobago" or "Saint Vincent and the Grenadines", Algolia would remove stop words from the highlighted results, and then the matching based on highlighted results wouldn't include them.
I now fixed this by also running the "dumber"
extractRegionNamesFromSearchQuery
, for a first pass of matching country/region names that's purely based on the search query.Only after that will it run the other matching logic (in order to also catch non-region matches, like
Salmon (farmed)
or alsoAfrica (UN)
).This now means that non-country entities that contain a stop word will not be matched - something like
Salmon and tuna
, maybe - but I think this is very much acceptable.There's a big code comment now explaining the rationale for all this logic, hope that one mostly clears it up!
Before / After
Link