-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preserve_original setting in edge ngram token filter #55766
Add preserve_original setting in edge ngram token filter #55766
Conversation
…c and new test class
1c4cf4e
to
ba598da
Compare
Pinging @elastic/es-search (:Search/Analysis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @amitmbm, thanks for opening this PR, looks great. I only left a few very minor remarks around formatting etc., the rest is okay. I will enabling running the tests so everything should be run past CI once you push another commit.
package org.elasticsearch.analysis.common; | ||
|
||
import org.apache.lucene.analysis.Tokenizer; | ||
import org.apache.lucene.analysis.core.WhitespaceTokenizer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this seems unused, our checkstyle rules will complain about unused imports, so better to remove it now before running the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intelliJ removed unused import wasn't configured for elasticsearch project, enabled it now :)
@@ -173,6 +173,10 @@ See <<analysis-edgengram-tokenfilter-max-gram-limits>>. | |||
(Optional, integer) | |||
Minimum character length of a gram. Defaults to `1`. | |||
|
|||
`preserve_original`:: | |||
(Optional, boolean) | |||
If set to true then it would also emit the original token. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wording might be better sth like "Emits original token then set to true
. Defaults to false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to Emits original token when set to
true. Defaults to
false.
--> notice changed to when
from then
in the suggested edit.
/** | ||
* Test class for edge_ngram token filter. | ||
* | ||
* @author Amit Khandelwal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we usually don't add @author tags to classes or test classes but rely on the commit history rather than code comments to track authors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, removed it.
* @author Amit Khandelwal | ||
*/ | ||
public class EdgeNGramTokenFilterFactoryTests extends ESTokenStreamTestCase { | ||
public void testDefault() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add newline befor first test method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just observed this in so many other test classes and copy-pasted the initial test setup :). nvm removed this.
@elasticmahine ok to test |
@elasticmachine run elasticsearch-ci/bwc |
@cbuescher thanks for kicking another test try for |
I agree, we'd still like to get a clean test run. Lets try this again. |
@elasticmachine update branch |
…gram-token-filter
@cbuescher looks like merging master into my feature branch fixed the test failures. Let me know if you can merge it if all looks OK. |
Hi @amitmbm, I merged your change to master and will also port it to the latest 7.x branch. Thanks for picking this up. |
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767
@cbuescher I'm really glad as it's my first commit merged to Elastic code base, I had raised another similar PR #55432 which is almost reviewed by your colleague Mark Harwood, but then there is no update on this PR from last 4 days. Hope he is safe and if you get time please look into this. After this, I want to pick some more changes and one of them is deprecating |
Thanks, great to hear you enjoyed working on the PR. We try to review user PRs in a timely manner but please don't expect anyone to respond to new commits etc... immediately because we all handle this differently and asynchronously. |
@cbuescher I understand that Elastic as a whole company work in async mode and my intent is not to push my PRs for review, it was stuck so I thought to bring this to you notice. Anyway thanks a lot for explaining this and I would keep this in mind. Also, reg. the deprecation changes, As you pointed out it requires more discussion, I would open a new issue and will discuss it there. Have a great day ahead 😄 |
Do we have any update on this PR? Did you translate the discussion? |
@mmoreram I raised PR but as @cbuescher mention in this comment , AS it's related to deprecation, hence he wanted to discuss this more with his team, I am also waiting for his update. @cbuescher let me know if you have any update. |
Raised this PR to address the bug #55767