Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduces field capabilities event loop block times by scaling linearly…
… using hashes (#75718) (#75805) ## Summary On the `security_solution` project from different customers we have been getting reports of scaling issues and excessive NodeJS event blocking times. After in-depth tracing through some of the index and field capabilities calls we identified two of the "hot paths" running through `field_capabilities` to where it is using double looped arrays rather than hashes. By switching these two hot spots out for hashes we are now able to reduce the event loop block times by an order of magnitude. Before this PR you can see event loop block times as high as: ```ts field_cap: 575.131ms ``` And after this PR you will see event loop block times drop by an order of magnitude to: ```ts field_cap: 31.783ms ``` when you're calling into indexes as large as `filebeat-*`. This number can be higher if you're concatenating several large indexes together trying to get capabilities from each one all at once. We already only call `getFieldCapabilities` with one index at a time to spread out event block times. The fix is to use a hash within two key areas within these two files: ```ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_capabilities.ts src/plugins/data/server/index_patterns/fetcher/lib/field_capabilities/field_caps_response.ts ``` This effect happens during the query of `SourceQuery`/`IndexFields` within `security_solution` but you should be able to trigger it with any application code who calls into those code paths with large index sizes such as `filebeat-*` anywhere in Kibana. An explanation of how to see the block times for before and after --- Add, `console.time('field_cap');` and `console.timeEnd('field_cap');` to where the synchronize code is for testing the optimizations of before and after. For example around lines 45 with the original code: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce(concatIfUniq, [] as string[]) .map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And around lines 45 with this pull request: ```ts const esFieldCaps: FieldCapsResponse = await callFieldCapsApi(callCluster, indices); console.time('field_cap'); // <--- start timer const fieldsFromFieldCapsByName = keyBy(readFieldCapsResponse(esFieldCaps), 'name'); const allFieldsUnsorted = Object.keys(fieldsFromFieldCapsByName) .filter((name) => !name.startsWith('_')) .concat(metaFields) .reduce<{ names: string[]; hash: Record<string, string> }>( (agg, value) => { // This is intentionally using a "hash" and a "push" to be highly optimized with very large indexes if (agg.hash[value] != null) { return agg; } else { agg.hash[value] = value; agg.names.push(value); return agg; } }, { names: [], hash: {} } ) .names.map<FieldDescriptor>((name) => defaults({}, fieldsFromFieldCapsByName[name], { name, type: 'string', searchable: false, aggregatable: false, readFromDocValues: false, }) ) .map(mergeOverrides); const sorted = sortBy(allFieldsUnsorted, 'name'); console.timeEnd('field_cap'); // <--- outputs the end timer return sorted; ``` And then reload the security solutions application web page or generically anything that is going to call filebeat-* index or another large index or you could concatenate several indexes together as well to test out the performance. For security solutions we can just visit any page such as this one below which has a filebeat-* index: ``` http://localhost:5601/app/security/timelines/ ``` Be sure to load it _twice_ for testing as NodeJS will sometimes report better numbers the second time as it does optimizations after the first time it encounters some code paths. You should begin to see numbers similar to this in the before: ```ts field_cap: 575.131ms ``` This indicates that it is blocking the event loop for around half a second before this fix. If an application adds additional indexes on-top of `filebeat`, or if it tries to execute other code after this (which we do in security solutions) then the block times will climb even higher. However, after this fix, the m^n are changed to use hashing so this only climb by some constant * n where n is your fields and for filebeat-* it will should very low around: ```ts field_cap: 31.783ms ``` ### Checklist Unit tests already present, so this shouldn't break anything 🤞 . - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
- Loading branch information