-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sub keyword field to string dynamic mappings - name and intent discussion #18195
Comments
In the original issue (#12394) I went into great detail to explain the reasoning behind this change, but to address your questions here:
In the past, the We can't deduce which use case a user intends when we receive a string field - it could be either. The solution for this is to provide a main The benefit of this is that, without any config, you get both access patterns for string fields out of the box. The downside is that you index string values twice.This is exactly the same pattern that Logstash has used for string fields for a long time so users of Logstash are unlikely to see any change. It is very easy to optimize disk space usage here: just map your fields as
No we aren't. This field is not named after the And For me, the only debate is whether this sub-field should be called |
+1 to what Clinton said. The fact that we did not map strings both for text search and keyword search/aggs in the past caused bad out-of-the-box experiences since you almost certainly had to reindex once you realized that you could not aggregate on whole string values. Regarding disk usage, it will be higher with default mappings for sure, but the problem is mitigated by the use of However I'm also open to changing the name to either |
Discussed it in Fix it Friday - we prefer the I will improve the docs to explain that we're optimising for the OOB experience, but disk usage can be improved with some simple mappings. |
|
Much of the road to 5.0 has been a theme of consistency. We've used That said, for me personally, With the hands-on-workshop, I teach people about analyzers/tokenizers by showing what happens to a |
And raw is a shorter name :) I think consistency is a good point here. But I'd like to be able to apply some token filters on this type of fields at some point so I don't think that having "raw" + an analyzer would make sense in term of meaning. I think we should mark this discussion as a blocker for the next release because it will be hard to change after we released the beta. |
I've been thinking the past few days how to find a way to convince myself that I thought In this model, I was telling Elasticsearch what the data is, and trying to distinguish strings vs keyword vs text was not fitting my mental model. The Elasticsearch documentation on mappings says this:
In this description, it seems that the mapping is presented as how Elasticsearch uses the data, not what the data is. If I view things with the how in mind, instead of the what, I think The above explanation may be confusing, but I think I can use this model -- how instead of what -- to tell stories in trainings, etc, about reasons for using I am still nervous about the difficult schema change this will require on the Logstash side; in the battle for consistency, Logstash will want to change the multifield |
If this proves to be a challenge to logstash, I'd personally be ok with keeping the field called |
There are a lot of users with massive amounts of data ingested through Logstash where the current .raw field convention is used. Changing the default from .raw has the potential to unnecessarily break a lot of systems and cause problems for users using the default templates or custom index templates based on these. Please take this into consideration before deciding to change the existing .raw field naming convention. |
@cdahlqvist We're discussing the options and impacts of I have a rough draft of a proposal here: logstash-plugins/logstash-output-elasticsearch#462 (comment) |
@jpountz I'd be OK having ES's default to |
@jordansissel I agree with the conclusion you reached in #18195 (comment) and I think that While I'm not completely against keeping the field as All that said, I obviously recognise that this makes for a painful transition in Logstash. I don't have great suggestions for how to make this easier, but the options are probably as follows:
|
+1 clint's comments and keeping 'keyword'. I think we can help users through this period of transition. It may be On Wednesday, August 10, 2016, Clinton Gormley [email protected]
|
As discussed with @jpountz in #17188 (comment) opening up a separate ticket for discussion here.
Some items for consideration:
keyword
for the multi-field name we are tightly coupling it to what tokenizer is used. For example if we every rename thekeyword
tokenizer tonoop
(which I would love to see since it more accurately describes what it does and also is how we tend to explain it to folks) then the multi-field option.The text was updated successfully, but these errors were encountered: