-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support flattened field type from Elasticsearch #25820
Comments
Pinging @elastic/kibana-app |
Thanks @Bargs for taking a look at the branch! I had a couple thoughts/ questions. First, from talking to the Beats team, I think it would be valuable to add support for terms aggregations, both on the root field (like Next, we had discussed if there was a way to expose the list of subfields/ keys that are available. I don't think it makes sense to return this as part of the mappings or field capabilities, because there may be a huge number distinct subfields (and the number of field mappings is assumed to be bounded to a reasonable number). A more sensible approach might be to index the subfield names into a separate lucene field, and allow for a terms aggregation that returns the most popular subfields. However, this doesn't fit perfectly with the current API around |
Sorry for the delay in response, I was out all last week. From a technical standpoint I think storing the subfield names in an index and doing a terms agg on them would work for autocompleting the field names in Kibana. But while chatting with @lukasolson I realized even with the field names we would still be missing type information and as a result would not be able to intelligently suggest query types. If we go down the path of trying to make these feel like regular searchable fields for average users I think we need to go all the way, so we would need that type information too. I have a feeling that probably complicates things even more. If so, we might be able to do without autocomplete on json fields for the time being. The most important thing to me is that the autocomplete doesn't appear broken or unpredictable to a normal user, but I think we might be able to solve that be adding some warnings in the UI if the user is searching on a JSON field. |
I've now started to pick up work JSON fields again and have a couple updates. First, we worked out how to support for keyword-style aggregations like Second, after thinking about it more, I think it could be valuable to provide access to the possible keys in the JSON field. Even beyond Kibana, this seems generally useful as part of a search workflow on these fields: a client could first retrieve the common keys in the JSON field, present them to the user, then allow for searches on these keys. Otherwise these keys must be known in advance, or can only be discovered by encountering them within documents returned from another search. To support this, we could index the keys into a separate field, and the common keys would then be retrieved through a
@Bargs @jpountz @jimczi I was curious about your thoughts on the above. The downsides are that this API isn't as elegant, and it may involve indexing more information. Note that this relates to @jimczi's question here about whether keyed JSON fields should use the |
Sounds good to me! We'd love to have a way to retrieve a list of the keys, whatever form the API takes. |
How would this work in practice? Is my understanding correct that Kibana would first retrieve fields from the field capabilities APIs, and as a second step for each At first sight this sounds like a good idea to me, but I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off: because there is no upper bound on the number of fields and than making sure to collect all fields would be too slow anyway, most of the time we would only collect a subset of the sub fields that exist in a |
I would like to put a spot on something @Bargs mentioned earlier already. Just knowing the field names (via |
@timroes Yes they would behave almost exactly like a |
@jpountz I am a bit worried about the "almost" in that sentence :D Could you tell what are the actual differences? Because we need to know if we are able to simply treat them as "keyword" fields (but that would then apply to all places), or if we can't, in which case we would need to know about that type difference somehow. |
@timroes Here are the differences to my knowledge (please review @jtibshirani):
Other than that, they should support the same set of queries and aggregations. |
Okay that sounds fine to me. We don't mind too much about the score (especially not about specific values) and we're not using So the plans here sound rather reasonable. I would just suggest that while creating that index pattern, we're giving the user a flag if there are any JSON fields contained, whether or not they want "to use those in Kibana", since it sounds to me, like we could potentially otherwise bloat the index pattern quiet much, and maybe users don't want to use them actually. |
@timroes In addition to the aggregation limitation that @jpountz mentioned (which we hope to address), I tried to list the restrictions here: https://github.com/elastic/elasticsearch/blob/object-fields/docs/reference/mapping/types/embedded-json.asciidoc#supported-operations. You'll notice that certain query types like
I'm also hoping to understand this better, would it be possible to walk through how Kibana would load + display the keys, given the current proposal of running a |
If I understand the purpose of Do we expect (can we assume) these fields are known ahead of time? Or do they need to be discovered via autocomplete (which may not even be possible to do well if the number of keys is large). Say we expect only a handful of embedded fields are of interest, and that handful doesn't change much - these are two big assumptions - then how about defining the embedded json fields as part of the index pattern (similar to how we do now for scripted fields)? It would not be as nice to work with as automatically discovered fields via autocomplete, but then the field type can be defined and there can be as many or as few as you want. |
Pinging @elastic/kibana-app-arch (Team:AppArch) |
Not being able to create visualizations in Kibana against fields in a Is there any chance this feature gets prioritized in the near future? |
some news about to create visualizations in Kibana? |
just chiming in, agree completely with @andrewkcarter mentioned above, really would like to adopt the |
Stumbled across this issue looking to create visualizations with the same field type. |
Hi folks - I think I've got an update for this thread that gets us some progress here. Elasticsearch now supports Runtime Fields, and Kibana supports the ability to add runtime fields to your index pattern. The particular relevant part of the documentation is the support of automatically pulling from Examples/Screenshots: Lets say I have a flattened Now I need to access a value to do some kind of aggregation, like the user's reported location. I add a new runtime mapping with the direct path within my flattened object, Now I can run an aggregation to find my top location hits Neat, but what about numeric values? Set the runtime field to directly access the user's current follower count at I do the same thing as prior for the user name field that I care about, Now show me the top users ranked by their number of followers: Sweet! Filters work too, here I've got some custom data that contains a value I care about. Same setup as prior for the From my testing so far, the performance is obviously an impact here but I'm not sure if that's the resources I've thrown at my cluster and data volume or the actual runtime field itself, but there does seem to be an impact. I have not tested the other runtime data types yet (like geo or ip), but keywords and numeric data seems to be working from my initial testing and I just wanted to share. |
@madisonb Amazing, thank you. |
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
I wanted to poke this issue to see if there has been any discussion/plan for this. Many Elastic integrations (https://github.com/search?q=repo%3Aelastic%2Fintegrations+%22%7C+flattened+%7C%22&type=code&p=1) (~63 as of searching), have adopted the use of the flatten field type, many of these being useful fields you'd want to search/aggregate on. Not having the ability to do this natively in Kibana is extremely limiting. While you can use something like runtime fields, this severely degrades performance1 and doesn't provide the best user experience for people unfamiliar with runtime fields/painless. Footnotes
|
We attempted to use runtime fields along with the flattened type and it caused severe degradation in performance (CPU util went from 10% -> 90% across all 9 nodes). We are also in need of some update on this ticket. (Enterprise ECK customer) |
With the release of 8.11 and the addition of ES|QL, I wonder if this can be used as an alternative. I haven't actually tested this, and oddly, the limitations section doesn't mention if flattened if supported or not, but maybe it is? |
@nicholas-r-king this sounds like a problem with Elasticsearch, or are these Kibana nodes? |
@BenB196 I guess it was forgotten to mention flattened fields in the list what's not supported, so currently, they are not supported in ES|QL |
A new object field type is coming to ES. I played around with the feature branch a bit today and collected some thoughts and findings. From what I've seen so far, there are some small updates to Kibana we'll definitely want to make and some things we should discuss.
JSON
type to kibana (or whatever name the ES lands on for this field type). Currently it shows up in the index pattern asunknown
.I only spent about an hour with it so there may be more things I'm missing, would definitely be good to get more eyes on it.
Feature branch here if anyone else wants to check it out: https://github.com/elastic/elasticsearch/tree/object-fields
The text was updated successfully, but these errors were encountered: