Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ignore_keywords flag for word delimiter graph token filter #59563

Merged
merged 4 commits into from
Jul 21, 2020

Conversation

malpani
Copy link
Contributor

@malpani malpani commented Jul 14, 2020

This commit allows customizing the word delimiter token filters to skip processing tokens tagged as keyword through the ignore_keywords flag Lucene's WordDelimiterGraphFilter already exposes.

Fix for #59491

…lter

Support ignore_keywords flag for word delimiter graph token filter

Lucene's WordDelimiterGraphFilter allows to skip processing of tokens tagged as keyword. However the Elasticsearch word delimiter graph token filter does not support this yet. I would like to update the Elasticsearch implementation to incorporate the ignore_keywords flag to enable better customization of token filters

Fix for elastic#59491
@cbuescher cbuescher added the :Search Relevance/Analysis How text is split into tokens label Jul 15, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 15, 2020
@cbuescher cbuescher added >enhancement v8.0.0 and removed Team:Search Meta label for search team labels Jul 15, 2020
@jrodewig jrodewig requested a review from romseygeek July 16, 2020 13:00
@romseygeek
Copy link
Contributor

@elasticmachine ok to test

@malpani
Copy link
Contributor Author

malpani commented Jul 16, 2020

thanks for running this through the tests, the failures look unrelated

  • BWC test failed on Building 7.9.0 didn't generate expected file /dev/shm/elastic+elasticsearch+pull-request-bwc/distribution/bwc/minor/build/bwc/checkout-7.x/distribution/archives/oss-linux-tar/build/distributions/elasticsearch-oss-7.9.0-SNAPSHOT-linux-x86_64.tar.gz
  • Default-distro failed on similar lines Caused by: org.gradle.api.InvalidUserDataException: Building 7.9.0 didn't generate expected file /dev/shm/elastic+elasticsearch+pull-request+default-distro/distribution/bwc/minor/build/bwc/checkout-7.x/distribution/archives/linux-tar/build/distributions/elasticsearch-7.9.0-SNAPSHOT-linux-x86_64.tar.gz

@romseygeek
Copy link
Contributor

Yes, we've had an internal version bump so all the BWC tests are expecting different versions. Can you merge in the master branch and push again?

This looks good, I think we also need to update the relevant docs (see word-delimiter-graph-filter.asciidoc for where to make changes).

@malpani
Copy link
Contributor Author

malpani commented Jul 19, 2020

I have added docs (reused wording from lucene docs) and also merged in the latest upstream/master

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @malpani! @jrodewig if you're happy with the docs change I'll merge.

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Made a small change to the docs.

@romseygeek romseygeek merged commit 08de504 into elastic:master Jul 21, 2020
@romseygeek
Copy link
Contributor

Thanks @malpani !

romseygeek pushed a commit that referenced this pull request Jul 21, 2020
…59563)

This commit allows customizing the word delimiter token filters to skip processing 
tokens tagged as keyword through the `ignore_keywords` flag Lucene's 
WordDelimiterGraphFilter already exposes.

Fix for #59491
@malpani
Copy link
Contributor Author

malpani commented Jul 21, 2020

thanks @romseygeek for reviewing and merging this in!

stevejgordon added a commit to elastic/elasticsearch-net that referenced this pull request Nov 25, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563
stevejgordon added a commit to elastic/elasticsearch-net that referenced this pull request Nov 26, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Nov 26, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Nov 26, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563
stevejgordon added a commit to elastic/elasticsearch-net that referenced this pull request Nov 30, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563

Co-authored-by: Steve Gordon <[email protected]>
stevejgordon added a commit to elastic/elasticsearch-net that referenced this pull request Nov 30, 2020
This introduces a new property for the word delimiter graph token filter
to configure ignoring of keywords.

It relates to this change elastic/elasticsearch#59563

Co-authored-by: Steve Gordon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants