-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for match_only_text field #6041
Documentation for match_only_text field #6041
Conversation
@andrross @harshavamsi @msfroh please let me know if I missed anything which you think could be relevant to the users |
9e5eb02
to
5a350a5
Compare
5a350a5
to
c2cabaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think it covers the essentials of the field type. Just added a couple of minor comments/questions.
|
||
## Parameters | ||
|
||
While `match_only_text` supports most parameters available for text fields, modifying most of them can actually be counterproductive. This field type thrives on simplicity and efficiency, minimizing data stored in the index to optimize storage costs. Therefore, sticking with the default settings is generally the best approach. Any adjustments beyond analyzer settings could reintroduce overhead and negate the efficiency benefits of `match_only_text`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the various parameters work? Like, could you theoretically coerce a match_only_text
field to behave more like a text
field? (If so, I don't think we should support those parameters. Could we just ignore them, maybe with a deprecation warning?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All unsupported params or unsupported values for a param will result in an error. But there are other parameters like stored
, term_vectors
which are still supported but shouldn't be enabled as it defeats the purpose.
I like your idea to be more precise here, so I will list down the ones which are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
|
||
## Migrating from text field | ||
|
||
Reindex API can be used to migrate from text field to `match_only_text` by updating the right mapping in the target index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an example of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
- Supports most query types, but not interval/span queries. | ||
|
||
|
||
Choosing `match_only_text` means prioritizing efficient full-text search over complex ranking and positional queries, while also optimizing storage costs. It excels when you need to quickly find documents containing specific terms without the overhead of storing frequencies and positions, leading to significantly smaller indexes. This translates to lower storage costs, especially for large datasets. However, it's not the best choice for ranking results based on relevance or for queries that hinge on term proximity or order, like interval/span queries. While it does support phrase queries, their performance isn't as efficient as with text field type. So, if pinpointing exact phrases or their locations within documents is essential, consider using text field type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it does support phrase queries, their performance isn't as efficient as with text field type. So, if pinpointing exact phrases or their locations within documents is essential, consider using text field type.
Does this maybe downplay it too much? match_only_text
supports phrase queries but performance will likely be very very bad, right? I'd maybe tweak the last sentence:
"So, if pinpointing exact phrases or their locations within documents is essential, use the text
field type instead."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! In addition to what you're suggesting, maybe I can add a line on how phrase queries work here and how bad/optimal they could depending on query and workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@rishabhmaurya Let me know when you've made the changes and are ready for me to review this documentation. Thanks a lot! |
Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Rishabh Maurya <[email protected]>
c2cabaa
to
526ef40
Compare
@kolchfa-aws I somehow missed this one and was long overdue. Have addressed the comments now. |
Signed-off-by: Fanit Kolchina <[email protected]>
@rishabhmaurya Thanks for putting up the PR! I reviewed and pushed my changed in this commit. Could you review to make sure it preserves technical accuracy? Then we'll move to editorial review. Thanks! |
@kolchfa-aws looks good to me, thanks for putting this up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!
|
||
A `match_only_text` field is different from a `text` field in the following ways: | ||
|
||
- Omits storing positions, frequencies, and norms, reducing storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reducing storage requirements"?
@@ -18,4 +18,5 @@ Field data type | Description | |||
:--- | :--- | |||
[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) | A string that is not analyzed. Useful for exact-value search. | |||
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) | A string that is analyzed. Useful for full-text search. | |||
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) | A space-optimized version of a text field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "text" be in code font?
@@ -12,12 +12,15 @@ redirect_from: | |||
|
|||
# Text field type | |||
|
|||
A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches with multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, stopwords removed, synonyms applied, etc. | |||
A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches with multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, stopwords removed, synonyms applied, and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "text" be in code font?
|
||
|
||
If you need to use a field for exact-value search, map it as a [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) instead. | ||
{: .note } | ||
|
||
The [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) field is a space-optimized version of the text field. If you don't need to query phrases or use positional queries, map the field as `match_only_text` instead of `text`. Positional queries are queries for which the position of the term in the phrase matters, such as interval or span queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "text" be in code font?
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Description
Document the usage of match_only_text field introduced as part of opensearch-project/OpenSearch#11039
Issues Resolved
Closes #5427
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.