-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track when fields contain multi-value arrays and expose in field_caps or mappings #64077
Comments
Pinging @elastic/es-search (:Search/Mapping) |
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
I just noticed this one! I've needed to test if a field has more than one value deep inside of a query implementation I'm working on. I'm fairly sure this information is available in Lucene for numbers but I have haven't checked if it is available for |
If this is something to be exposed in field caps or mappings then we don't want to look at values in the index itself for performance reasons. We've previously discussed adding flags to mappers to forbid multiple values in #58523 |
@nik9000 It's not relevant for the use cases I was considering where The reasoning behind this is shown in the example above: it's important to know the denominator of the terms aggregation compared to the total number of values. |
You can ask |
Don't you have this information in the response already ? The sum is greater than the doc count so you have some documents that have multiple values. Can you clarify what you'd like to do with this information at the field caps level ? It's hard to see why it would be helpful to know globally and what actions you can trigger from this info. |
@jimczi We don't always have this information, like when the top terms are actually less than the total count of documents, or when we aren't tracking total hits. The important part of the example above is that it's often ambiguous in the UI whether we are showing a "count of documents" or a "count of values in an array". It would help us provide a less ambiguous UI if knew for sure that a field is a singleton. Unlike #58523, what I'm asking for in this issue is to assume that fields are singletons until they aren't. By having a flag that indicates that a field is expected to contain more than one value, we can guide users away from some bad practices:
We can't guide users in this way unless we have this extra information. We could work around it by analyzing individual documents with the |
Multi-value arrays cause problems in Kibana because we default to single-value fields, while multi-value fields create a higher
doc_count
than the total doc_count on the index. So instead of showing numbers like "100% of documents" we might end up showing "200% of documents", which looks completely wrong.The request here is for Elasticsearch to make it easier to know that a field is expected to contain multiple values, in the mapping and field_caps responses. This would help us generate the right queries from Kibana.
On field_caps, we might get a response like this:
Example of the problem
In this example we have the Terms aggregation reporting 7,409 as the overall doc_count, while there are only 4,675 total hits in the query. This is confusing and needs to be handled by the client code:
I would expect that the
overall
value in the response would equal the total number of documents. But this is not the case:The text was updated successfully, but these errors were encountered: