Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for match_only_text field #6041

Merged
merged 5 commits into from
Feb 1, 2024

Conversation

rishabhmaurya
Copy link
Contributor

@rishabhmaurya rishabhmaurya commented Jan 5, 2024

Description

Document the usage of match_only_text field introduced as part of opensearch-project/OpenSearch#11039

Issues Resolved

Closes #5427

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rishabhmaurya rishabhmaurya marked this pull request as ready for review January 5, 2024 00:21
@rishabhmaurya rishabhmaurya self-assigned this Jan 5, 2024
@rishabhmaurya
Copy link
Contributor Author

@andrross @harshavamsi @msfroh please let me know if I missed anything which you think could be relevant to the users

@kolchfa-aws kolchfa-aws self-assigned this Jan 5, 2024
@hdhalter hdhalter added v2.12.0 release-notes PR: Include this PR in the automated release notes 3 - Tech review PR: Tech review in progress labels Jan 8, 2024
Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think it covers the essentials of the field type. Just added a couple of minor comments/questions.


## Parameters

While `match_only_text` supports most parameters available for text fields, modifying most of them can actually be counterproductive. This field type thrives on simplicity and efficiency, minimizing data stored in the index to optimize storage costs. Therefore, sticking with the default settings is generally the best approach. Any adjustments beyond analyzer settings could reintroduce overhead and negate the efficiency benefits of `match_only_text`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the various parameters work? Like, could you theoretically coerce a match_only_text field to behave more like a text field? (If so, I don't think we should support those parameters. Could we just ignore them, maybe with a deprecation warning?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All unsupported params or unsupported values for a param will result in an error. But there are other parameters like stored, term_vectors which are still supported but shouldn't be enabled as it defeats the purpose.
I like your idea to be more precise here, so I will list down the ones which are supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks


## Migrating from text field

Reindex API can be used to migrate from text field to `match_only_text` by updating the right mapping in the target index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an example of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

- Supports most query types, but not interval/span queries.


Choosing `match_only_text` means prioritizing efficient full-text search over complex ranking and positional queries, while also optimizing storage costs. It excels when you need to quickly find documents containing specific terms without the overhead of storing frequencies and positions, leading to significantly smaller indexes. This translates to lower storage costs, especially for large datasets. However, it's not the best choice for ranking results based on relevance or for queries that hinge on term proximity or order, like interval/span queries. While it does support phrase queries, their performance isn't as efficient as with text field type. So, if pinpointing exact phrases or their locations within documents is essential, consider using text field type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it does support phrase queries, their performance isn't as efficient as with text field type. So, if pinpointing exact phrases or their locations within documents is essential, consider using text field type.

Does this maybe downplay it too much? match_only_text supports phrase queries but performance will likely be very very bad, right? I'd maybe tweak the last sentence:

"So, if pinpointing exact phrases or their locations within documents is essential, use the text field type instead."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! In addition to what you're suggesting, maybe I can add a line on how phrase queries work here and how bad/optimal they could depending on query and workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kolchfa-aws
Copy link
Collaborator

@rishabhmaurya Let me know when you've made the changes and are ready for me to review this documentation. Thanks a lot!

@rishabhmaurya
Copy link
Contributor Author

@kolchfa-aws I somehow missed this one and was long overdue. Have addressed the comments now.

Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws
Copy link
Collaborator

@rishabhmaurya Thanks for putting up the PR! I reviewed and pushed my changed in this commit. Could you review to make sure it preserves technical accuracy? Then we'll move to editorial review. Thanks!

@rishabhmaurya
Copy link
Contributor Author

@kolchfa-aws looks good to me, thanks for putting this up.

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!


A `match_only_text` field is different from a `text` field in the following ways:

- Omits storing positions, frequencies, and norms, reducing storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reducing storage requirements"?

_field-types/supported-field-types/match-only-text.md Outdated Show resolved Hide resolved
_field-types/supported-field-types/match-only-text.md Outdated Show resolved Hide resolved
_field-types/supported-field-types/match-only-text.md Outdated Show resolved Hide resolved
_field-types/supported-field-types/match-only-text.md Outdated Show resolved Hide resolved
@@ -18,4 +18,5 @@ Field data type | Description
:--- | :---
[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) | A string that is not analyzed. Useful for exact-value search.
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) | A string that is analyzed. Useful for full-text search.
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) | A space-optimized version of a text field.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "text" be in code font?

_field-types/supported-field-types/text.md Outdated Show resolved Hide resolved
@@ -12,12 +12,15 @@ redirect_from:

# Text field type

A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches with multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, stopwords removed, synonyms applied, etc.
A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches with multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, stopwords removed, synonyms applied, and so on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "text" be in code font?

_field-types/supported-field-types/text.md Outdated Show resolved Hide resolved


If you need to use a field for exact-value search, map it as a [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) instead.
{: .note }

The [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) field is a space-optimized version of the text field. If you don't need to query phrases or use positional queries, map the field as `match_only_text` instead of `text`. Positional queries are queries for which the position of the term in the phrase matters, such as interval or span queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "text" be in code font?

kolchfa-aws and others added 2 commits January 30, 2024 13:56
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws kolchfa-aws added 6 - Done but waiting to merge PR: The work is done and ready to merge and removed 3 - Tech review PR: Tech review in progress labels Jan 30, 2024
@hdhalter hdhalter merged commit de8eb4e into opensearch-project:main Feb 1, 2024
4 checks passed
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 6 - Done but waiting to merge PR: The work is done and ready to merge labels Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete release-notes PR: Include this PR in the automated release notes v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] match_only_text handler
6 participants