-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pre-configured “lowercase” normalizer #53882
Conversation
Pinging @elastic/es-search (:Search/Mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a really helpful addition. Overall, I think it would be good to add a test to make sure everything is 'wired up' as expected -- one option is to add a test case to KeywordFieldMapperTests
that exercises the built-in lowercase
normalizer.
It's possible that users already have a normalizer named lowercase
configured in the settings. (Note that in the future we plan to ban users from defining analysis components with the same names as built in ones, but we currently allow this behavior: #22263). Some suggestions to help start the discussion on how this should be handled:
- We should make sure that we at least don't error out in this case, since it could be a common set-up. Ideally, I think we'd prefer the user-defined normalizer so that there aren't any surprising changes in behavior during an upgrade.
- We can add an entry to the migration documentation encouraging users to remove their custom-defined 'lowercase' normalizer in favor of using the built-in one, or to rename it.
server/src/main/java/org/elasticsearch/index/analysis/LowercaseNormalizer.java
Outdated
Show resolved
Hide resolved
Thanks for the review, @jtibshirani !
I checked this with a pre-existing "lowercase" field that didn't actually lowercase (only ascii-folding). The behaviour was what I would have hoped for:
|
37a3d02
to
9f07520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this with a pre-existing "lowercase" field that didn't actually lowercase (only ascii-folding). The behaviour was what I would have hoped for
That behavior makes sense to me too. A couple last comments:
- We could add a test to verify the behavior that we always prefer a user's custom analyzer definition. This would guard against accidental changes to the upgrade behavior that we want. Perhaps
AnalysisRegistryTests
would be a good place to add a check. - I think we should mention the change in the migration documentation. Otherwise users won't know that they can clean up the index settings and remove a custom
lowercase
analyzer.
Finally, I wonder if it's worth checking with the team that we're happy with this approach. It would set a precedent for adding future built-in analyzers (or perhaps there's already a precedent I don't know about?)
Thanks for the comments, Julie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment. Other than that it looks good to me, thanks for the all the iterations.
server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldMapperTests.java
Outdated
Show resolved
Hide resolved
de9f454
to
5387a51
Compare
server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldMapperTests.java
Outdated
Show resolved
Hide resolved
A pre-configured normalizer for lower-casing. Closes #53872
Simplify the common scenario of wanting to lower-case values.
Closes #53872