-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T2118 fuzzy search #4401
T2118 fuzzy search #4401
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Testing this on TEST, it’s great that quoting a term will turn off fuzziness! 👏 One could imagine, that one day, a “relevance” sort order will help more useful fuzzied to be prioritised by ElasticSearch over less useful ones, maybe? (Andrew mentions ES could penalise typos below relevant matches). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, all works nicely! Just a couple of small simplifications in the config parser
Co-authored-by: Andrew Nowak <[email protected]>
Co-authored-by: Andrew Nowak <[email protected]>
Co-authored-by: Andrew Nowak <[email protected]>
Co-authored-by: Andrew Nowak <[email protected]>
Suggested changes all commited as suggested |
Seen on auth, metadata-editor, thrall, cropper, kahuna (created by @AndyKilmory and merged by @andrew-nowak 11 minutes and 24 seconds ago) Please check your changes! |
Seen on leases, usage, media-api (created by @AndyKilmory and merged by @andrew-nowak 11 minutes and 29 seconds ago) Please check your changes! |
Seen on collections, image-loader (created by @AndyKilmory and merged by @andrew-nowak 11 minutes and 34 seconds ago) Please check your changes! |
What does this change?
This introduces the ability to allow fuzziness in basic searching (i.e. text typed into the search bar, but not chips) to help when users have spelling errors in their search terms
The fuzziness can be switched on and off via an api config parameter and some of the variables controlling the behaviour of fuzziness have also been exposed;
search.fuzziness.enabled : Boolean = true/false (default = false) <-- will fuzziness be activated
search.fuzziness.prefixLength : Int = 0...x (default = 1) <-- how many of the initial characters must be exact match
search.fuzziness.editDistance : String = AUTO or AUTO:short,med or [1,2,3,4] (default: AUTO) <-- sets the allowed edit distance, in case of AUTO this sets the edit distance based on search token length, AUTO:short,med configures the word length boundary for exact matches and single edit or double edit word legnths
search.fuzziness.maxExpansions : Int = 0..x (default = 50) <-- max number of variations created.
For more details see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-fuzzy-query.html
Fuzzy search is only applied to Word queries e.g. Lightroom Develop and not Phrase queries e.g. "Lightroom Develop" (wrapped in quotes) as multi-match queries are treated as exact term match.
Note the intrioduction of fuzziness changes the search structure from 'cross-fields' to 'best-field' - this means that all the words/tokens searched for need to all appear in one of the searched fields rather than being able to appear across the range of searched fields.
If doument is;
{
title: 'red fox',
description: 'jumped over the dog'
}
a cross-field search for "red fox" will match this document but a best-field search will not - the document would need to have the description: 'red fox jumped over the dog' for a match to be found via best-field.
How should a reviewer test this change?
Ensure that the search results match up as expected given the chosen search parameters
Who should look at this?
Tested? Documented?