-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a constant_keyword
field.
#49713
Introduce a constant_keyword
field.
#49713
Conversation
This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source.
Pinging @elastic/es-search (:Search/Mapping) |
singleton_keyword
field.constant_keyword
field.
This looks like a helpful addition! My first impression is that there is some overlap with existing mapping options:
Related to the above, would you be able to summarize the motivating use case? I think I understand it from looking over internal discussions, but it would be nice to verify (and a summary would be helpful for our external contributors taking a look). |
Thanks for looking @jtibshirani. I added more documentation that should help answer your questions. Queries on I have a preference for a separate field over an option on |
Thanks @jpountz, I understand the motivation better now. I could also see this field type being useful when dealing with the 'document type' migration: perhaps a user had a 5.x index containing two document types, then they separated each type into its own index in 6.x. Modelling the type information as a Some other high-level thoughts before I work on a detailed review:
|
Hmm I thought I had raised this question on this PR, but I forgot. I agree this is a good question. One way to handle this would be to let clients know that a
I raised this question to @clintongormley. For the use-cases we started envisioning for this field, like a |
@timroes mentioned to me that it doesn't matter much to Kibana whether this field is exposed as a Thinking more about this, I wonder whether we will want to handle |
My vote would be for this option. The reason being, I may decide to split my data so I have all my So I think we need the field to accept values in _source when they are equal to the value configured otherwise it might make the feature somewhat hard to use in practice. Another benefit this gives is that if I accidentally point BarBeat at the same index I will get rejections and know that I've made a mistake. |
We only use it to introduce new fields today indeed, though it's not so different in that an introduction of a new field is a modification of the parent |
@jtibshirani You said the change looks good to you, but didn't approve it. Let me know if there are other things that you would like to discuss regarding this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had wanted to double-check one more thing (which I just did). Approved!
Thanks! |
@elasticmachine run elasticsearch-ci/2 |
@jpountz Thanks for making this happening. Looking forward to play around with it! |
@ruflin You're welcome. Note that the backport is still pending in case you plan to play with a 7.x snapshot. |
This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source. Backport of elastic#49713
This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source. Backport of #49713
These tests can be enabled now that the change has been backported.
Relates: elastic/elasticsearch#49713 This commit adds the ConstantKeyword property to the client. Value is exposed as type Object as it can be a string or numeric value.
I hope a late comment is better than none: to me 'constant' means the value can't change, like read-only after the document was indexed, not that all documents of an index have the same value. In that case why store it at all? |
Do you have suggestions for a better name? |
I'm no native English speaker but 'equal' and 'same' are the terms that come up in translators that I would understand. |
This field is a specialization of the
keyword
field for the case when alldocuments have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like
term
queries on_index
.The name is up for discussion. I liked including
keyword
in it, so that westill have room for a
constant_numeric
in the future. However I'm unsurewhether to call it
constant
,singleton
or something else, any opinions?For this field there is a choice between
_source
when they are equal to the value configuredin mappings, but rejecting mapping updates
_source
but then allowing updates to the value thatis configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.