Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(search): fix search highlighting of entities containing stop words #3718

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

marcelgerber
Copy link
Member

@marcelgerber marcelgerber commented Jun 18, 2024

So, earlier today when we were talking about search, I assured Lars that "yes, we can now match all kinds of entities in search queries". I realized shortly thereafter that this wasn't the case.

The problem is that for countries like "Trinidad and Tobago" or "Saint Vincent and the Grenadines", Algolia would remove stop words from the highlighted results, and then the matching based on highlighted results wouldn't include them.

I now fixed this by also running the "dumber" extractRegionNamesFromSearchQuery, for a first pass of matching country/region names that's purely based on the search query.
Only after that will it run the other matching logic (in order to also catch non-region matches, like Salmon (farmed) or also Africa (UN)).
This now means that non-country entities that contain a stop word will not be matched - something like Salmon and tuna, maybe - but I think this is very much acceptable.

There's a big code comment now explaining the rationale for all this logic, hope that one mostly clears it up!

Before / After

CleanShot 2024-06-18 at 18 04 41

Link

@owidbot
Copy link
Contributor

owidbot commented Jun 18, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-search-country-matching-stop-words

SVG tester:

Number of differences (default views): 0 ✅
Number of differences (all views): 0 ✅

Edited: 2024-06-18 17:13:59 UTC
Execution time: 1.13 seconds

Copy link
Member

@ikesau ikesau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Tested locally and can get good results for Salmon, and all our countries with and in the title. (The only weird one is Serbia and Montenegro which is also 2 standalone countries https://ourworldindata.org/grapher/population?tab=chart&country=MNE~SRB~OWID_SRM 🥴)

With a couple of the caribbean islands, I wonder if we should add shortcodes like ["Saint Kitts", "St Kitts"], ["Saint Vincent", "St Vincent"] etc. Not a blocker for this PR at all, of course.

@marcelgerber marcelgerber merged commit f603a6b into master Jun 25, 2024
28 checks passed
@marcelgerber marcelgerber deleted the search-country-matching-stop-words branch June 25, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants