Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data views] async and partial loading of field list #152159

Closed
mattkime opened this issue Feb 25, 2023 · 13 comments
Closed

[data views] async and partial loading of field list #152159

mattkime opened this issue Feb 25, 2023 · 13 comments
Assignees
Labels
discuss Feature:Data Views Data Views code and UI - index patterns before 8.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.

Comments

@mattkime
Copy link
Contributor

mattkime commented Feb 25, 2023

Currently, the data views api loads the full field list when a data view is loaded and subsequent references to the field list are synchronous. This leads to slower performance with data views larger than 1k fields and can cause big problems above 6k fields.

Changing how data view fields are accessed touches nearly all kibana apps. The work to integrate the new api will be larger than that to create it.

To address this, the data views api should return a new type of data view called DataViewAsyncFields its just like a data view but the fields array is replaced with a getFields method. DataViewAsyncFields.getDataView would return an old style DataView to allow for migration. It would be removed once migration is complete.

The goal for the getFields api is to provide api consumers with a way of describing the fields they want so they don't need to iterate over the results to find what they need.

getFields params
  fields: string or array of field names. Wildcards `*` supported
  includeUnmapped: boolean. defaults to false // maybe this should be mapping type - mapped / unmapped / runtime / scripted / meta
  types: types of fields to be returned. es types (long, int, etc) with kbn types (Number, string, etc) shim
  perPage: number of fields to return (default 500)
  page: which page of fields to return
  searchable: boolean
  aggregatable: boolean
  isMetadataField:boolean // maybe this isn't needed if there's a more general tool that does the same.
  indexFilter: query object

Todo

  • caching
  • stream based parsing - do we need it if we're loading smaller batches of fields?

Addresses #147484

Similar / related issues:
#134306
#139340
#151249

@mattkime mattkime added Feature:Data Views Data Views code and UI - index patterns before 8.0 Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Feb 25, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Mar 23, 2023

After some offline discussion with @mattk, we thought it be useful to relay that discussion here:

Using async/paging is a re-envisioning of how we retrieve fields in the stack. It may impact both Elasticsearch (with possible new requirements for API), and also Kibana. Besides new API-requirements for Elasticsearch, the impact on Kibana will certainly have two components:

  • new internal field-fetching logic, to handle the paging/async request
  • new UX-patterns in relevant apps (Dsicover, ...) to deal with "incomplete" field-states

This can certainly be the best long-term way forward. That said, I wonder how much of "Kibana not being able to deal with large field-lists" issue is related to lower-level implementation details of how Kibana handles large field_caps responses.

Generally speaking, browsers should be able to handle response sizes in the order of 10s to 100s of megabytes, which is from what I understand the range of field_caps respones for 1000s of fields.

(1) Let's take the following assumption: large field_caps responses are not an issue for Elasticsearch. The request neither times-out, nor does the cluster raise any errors/crashes/... (This is just the working assumption (!) @DaveCTurner may confirm/deny :) (?))

(2) On Kibana-side, I'd investigate following bottlenecks:

In Kibana-server:

  • there's known endpoints where kibana decodes and re-encodes an ES-response (_bsearch is an example). Kibana-server holds an entire JSON-response in-memory server-side.
  • We need to ensure that for field-caps, the Kibana endpoint:
    • only streams the raw ES-response straight to browser (ie. it doesn't buffer the full response)
    • do any decoding/encoding on it (e.g. some kibana endpoints re-encode to base64. I don't think this is the case, but let's ensure)

In Kibana-browser:

  • parsing the response. For large JSON-docs, we should not be using JSON.parse. JSON.parse is suboptimal because of two reasons:
    • it requires the entire response to be serialized into a string-object. This string-object has no useful relevance whatsoever, yet it creates a ton of duplicate cruft.
    • Calling JSON.parse blocks all other threads while the data is unmarshalled. This is a common place where browsers "hang" or may even crash.
  • retaining field-list in memory
    • I don't think there should be an issue of heap-allocating thousands of fields as thousands of JS-objects, even in the order of 100s of megs. (I don't rule it out, but I don't see a big issue given the state of browsers and machines today).
    • Partically speaking, it should mean that DataView#getFields certainly can respond with 10000s of fields to the client.
  • DOM insertion
    • at some point, all these fields will be inserted somewhere in the DOM (e.g. as nodes in Discover left field-list sidebar). I am uncertain if this is really causing issues. I would expect "no", given that thousands of nodes is not that strange, but React may struggle with it nonetheless. If this is an issue (ie. browser borks because DOM gets too large), we can think of client-side only issues on how to resolve (e.g. a client-side tabbing or paging mechanism).

So I would first confirm we're not introducing bottlenecks on these lower levels. If we still have performance issues after addressing those, I would look into more sophisticated field-loading strategies like paging and/or async search.


All that said; I am certainly in favor of:

  • Apps should only load the fields they need (e.g. Lens chart on dashboard does not need all 10000s fields)
  • DataView APIs are expanded to have more fine-grained use-cases.

Those just impose more design requirements, more coordination-effort, and imho could divert attention from lower hanging fruit.

@mattkime
Copy link
Contributor Author

@thomasneirynck

We should come up with a definition of what it means to support a large field list. We're seeing 10k field lists more routinely. I think we had an SDH that involved a 100k fields. Its definitely a judgement call to decide what we need to support. I'd like to make sure we support 250k fields.

src/plugins/data_views/server/fetcher/index_patterns_fetcher.ts does do some processing of the ES field_caps response. I think this is worth investigating if we find a delay in receiving the response.

Regarding dom insertion - I think anything dealing with a big field list is already using a virtualized rendering of some sort - I should verify this.


My action items -

  1. Check in with security solution devs to get a better problem definition from their perspective. I think they have a better idea regarding the number of fields and the specific impact.

  2. I should produce some ballpark numbers for the relationship between field count, payload size, and parsing time.

@davismcphee
Copy link
Contributor

at some point, all these fields will be inserted somewhere in the DOM (e.g. as nodes in Discover left field-list sidebar). I am uncertain if this is really causing issues.

Regarding dom insertion - I think anything dealing with a big field list is already using a virtualized rendering of some sort - I should verify this.

@thomasneirynck @mattkime I don't have much to add to this discussion overall, but just wanted to confirm that we use virtualization in the Unified Field List currently (both Discover and Lens) since it was brought up, so large field lists should perform well there at least.

@DaveCTurner
Copy link

DaveCTurner commented Mar 24, 2023

Let's take the following assumption: large field_caps responses are not an issue for Elasticsearch. The request neither times-out, nor does the cluster raise any errors/crashes/... (This is just the working assumption (!) @DaveCTurner may confirm/deny :) (?))

We're pretty robust in this area in recent versions, but there's a few parts worthy of clarification:

  • re. crashes, we would consider it an ES bug if field caps (or any other) requests could be harmful to the cluster in this way.

  • re. errors, we may return 429 Too Many Requests if we are overloaded, and we may be unable to respond (or may only partially respond) if the cluster is unhealthy for some other reason. But it's an ES bug if this happens for no obvious reason (e.g. while the load on the cluster is within expectations and the cluster reports no health problems). Users will always find ways to push their clusters too hard and end up in these situations, so we will always need Kibana to do its best to react gracefully to errors.

  • re. timeouts, Elasticsearch does not itself have any timeouts in this area, but there may be timeouts imposed by intermediary proxies or by Kibana itself over which Elasticsearch has no control. We sometimes need to fan-out the field caps computation across tens-of-thousands of shards in hundreds of remote clusters so we're not offering any real-time guarantees here. As with errors, it's an ES bug if we take too long for no obvious reason, and likewise we still need Kibana to do its best to react gracefully to slow responses.

I thought of a couple of other things we could do in ES to help too:

  • I believe it's difficult to parse a pure JSON response in a streaming fashion within Kibana, but we are free to choose a different and more streamable format (e.g NDJSON) if that would help.

  • I think we don't have any caching in ES around field caps at the moment, but that's definitely an option if we encounter further performance constraints. IMO it's good to push caching into ES because this allows for a cache to be shared across multiple clients in ways that a Kibana-side cache cannot.

Finally, to repeat my comments from earlier in the week, it's always going to be faster (and cheaper, and more scalable and robust) if ES is computing smaller responses, so it would be great if we could try and find ways to avoid needing to extract a list of All The Fields from ES wherever possible. I think we don't consider field caps to be an "interactive" API today and accept that it might take several hundred milliseconds to respond, but maybe we need to change our thinking in this area.

@kertal kertal added the discuss label Mar 28, 2023
@mattkime
Copy link
Contributor Author

mattkime commented Apr 6, 2023

I'm currently investigating options for increasing the speed of loading fields in the current format.

@mattkime
Copy link
Contributor Author

@aarju and the Infosec Detections and Analytics team has a use case where field caps takes 9s to resolve -
Screenshot 2023-04-25 at 17 13 37 (1)
kibana-app.whitesector.inf.elasticnet.co (2).har.zip

I spoke to @dnhatn and he explained

If a field-caps hits many indices and these indices don’t have identical mappings and contain multiple fields, the request will be slow. It is preferable to only ask for fields that we will actually use. This reduces the merging cost and response size. In some cases, requesting all fields might be faster, but that typically occurs when the response is very fast. In most cases, requesting specific fields will reduce the worst-case scenario.


As best I can tell this makes an argument for loading fields as needed.

@kertal
Copy link
Member

kertal commented May 22, 2023

As best I can tell this makes an argument for loading fields as needed.

Definitely! I have a follow up question to @dnhatn, would it actually help if we had an option in the fieldAPI request to just get fields that actually have values?

@dnhatn
Copy link
Member

dnhatn commented May 22, 2023

would it actually help if we had an option in the fieldAPI request to just get fields that actually have values?

@kertal I am not sure if that option would help. ES will need to perform checks to determine if a field has values or not. It will also have to handle cases where one shard has values for a field while other shards do not

@e40pud
Copy link
Contributor

e40pud commented May 31, 2023

At security solution we would benefit from any kind of improvements in fields fetching a lot.

In 8.7 we introduced the feature where we show to user warnings about field conflicts and when filed is unmapped in some of the indices. For that we have to fetch extended information about each field using includeUnmapped flag. Apparently this slowed down dramatically the usage of Rule Exception flyout as we need to load all fields on opening. One of the first who came back to us were InfoSec folks as they have huge amount of fields in their indices.

Screenshot 2023-05-31 at 17 10 49

After some research, it seems like there is no way around this at the moment. We are fully in for any changes that could improve that for us.

@e40pud
Copy link
Contributor

e40pud commented Jun 28, 2023

We’ve got couple more SDH related to heavy _field_caps responses. This time it happens with EQL queries https://github.com/elastic/security-team/issues/6962

@davismcphee davismcphee added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Sep 8, 2023
@dhurley14
Copy link
Contributor

Another SDH came in for large field list loading issues: https://github.com/elastic/sdh-security-team/issues/737

@mattkime
Copy link
Contributor Author

closed by #173948

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Data Views Data Views code and UI - index patterns before 8.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.
Projects
None yet
Development

No branches or pull requests

10 participants