Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use jaro_winkler similarity instead of rapidfuzz #491

Merged
merged 4 commits into from
Aug 6, 2024
Merged

Conversation

Rotheem
Copy link
Member

@Rotheem Rotheem commented Jul 7, 2024

Description

Use jaro_winkler similarity from jellyfish module instead of rapidfuzz. It has the same speed and better results

Copy link

codecov bot commented Jul 7, 2024

Codecov Report

Attention: Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.11%. Comparing base (ab3733e) to head (cb09030).
Report is 1 commits behind head on main.

Files Patch % Lines
app/utils/tools.py 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #491      +/-   ##
==========================================
- Coverage   83.12%   83.11%   -0.02%     
==========================================
  Files         104      104              
  Lines        6412     6420       +8     
==========================================
+ Hits         5330     5336       +6     
- Misses       1082     1084       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"""

# We can give a dictionary of {object: string used for the comparison} to the extract function
# https://maxbachmann.github.io/RapidFuzz/Usage/process.html#extract

# TODO: we may want to cache this object. Its generation may take some time if there is a big user base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the time to do it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of caching we may explore the list generation in the database
https://docs.sqlalchemy.org/en/20/core/functions.html#sqlalchemy.sql.functions.concat

@Rotheem Rotheem force-pushed the better-user-search branch from 122c315 to a9c42e7 Compare July 22, 2024 09:34
@Rotheem Rotheem force-pushed the better-user-search branch 2 times, most recently from a6f73f5 to eb7e4e5 Compare August 6, 2024 18:20
Copy link
Member

@armanddidierjean armanddidierjean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Rotheem Rotheem merged commit cadf2f1 into main Aug 6, 2024
7 checks passed
@Rotheem Rotheem deleted the better-user-search branch August 6, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants