Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize saved objects getScopedClient and HTTP API #68221

Merged
merged 5 commits into from
Jun 5, 2020

Conversation

rudolf
Copy link
Contributor

@rudolf rudolf commented Jun 4, 2020

Summary

When the spaces plugin is enabled to secure the saved objects client and the HTTP API we create several saved object repositories for every request. Because we're re-calculating the active mappings for every saved objects repository that's created this creates a performance bottleneck.

This PR optimizes the performance by caching the active mappings making it faster to create repositories and also by re-using the same repository in the spaces client's getScopedClient method.

It further uses shareReplay for the the config$ observable we provide to plugins to prevent running config path validation every time a plugin reads from it's config observable.

Note: There's also a lot of time spent in legacy code that augments the hapi request object with uisettings which is unnecessary, but I'm leaving that as is for now

const savedObjectsClient = request.getSavedObjectsClient();

Screenshot 2020-06-04 at 10 27 54

Thanks for the profiling work @nchaulet and the fleet team.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@jfsiii
Copy link
Contributor

jfsiii commented Jun 4, 2020

I took a look at this locally and it looks outstanding. I'm happy to provide more details about recreating the test or viewing the cpu profiles but here are some notes

Test

1000 connections to /api/features [1] for 60secs

TL;DR: Huge improvements in key areas

RPS: 34 -> 178
Event loop delay: 3000ms -> 200ms

Master

Completed 2K requests. 4K errors
Median response time ~8.5s
Event loop delays of 3000ms (3 seconds)

Screen Shot 2020-06-04 at 5 54 31 AM

Screen Shot 2020-06-04 at 6 07 02 AM

PR

Completes 11K requests. 4K errors
Median response time ~1.1s
Event loop delays of 150ms-200ms (0.15-0.2 seconds)

Screen Shot 2020-06-04 at 6 15 13 AM

Screen Shot 2020-06-04 at 6 45 57 AM

[1] Chose /api/features since it's outside the Ingest Manager plugin and, IIUC, doesn't do any Saved Object or much other route. We've been using this as a realistic example of Kibana perf/overhead

@jfsiii
Copy link
Contributor

jfsiii commented Jun 4, 2020

I initially showed tests against /api/features to demonstrate that the changes should benefit all Kibana, but here's a Fleet specific example.

This is hitting /api/enroll with 10, 50, 100, 150, and finally 250 connections. Initially in the PR branch, then on master.

Sorry for the "after"/"before" arrangement but the event loop improvements are still present

enroll command
autocannon -c 100 -d 60 --method POST --header 'kbn-xsrf: xxx' --header 'Authorization: ApiKey cDl1LWZISUJ2d3RqeklLdGhUVnY6RHdHRmhMbFdSVEt2VDFUamtTZ0NKQQ==' --header 'Content-Type: application/json' --body '{
    "type": "PERMANENT",
    "metadata": {
        "local": {
            "host": "localhost",
            "ip": "127.0.0.1",
            "system": "Darwin 18.7.0",
            "memory": 34359738368
        },
        "user_provided": {
            "dev_agent_version": "0.0.1",
            "region": "us-east"
        }
    }
}' http://localhost:5601/api/ingest_manager/fleet/agents/enroll

Screen Shot 2020-06-04 at 10 22 45 AM

@rudolf
Copy link
Contributor Author

rudolf commented Jun 4, 2020

I've pushed some more changes and I think we've now removed all the low hanging fruit.

Screenshot 2020-06-04 at 16 45 12

Most of the time is now spent in validating the request which takes 0.4ms or ~10% of the time

return schema.maybe(schema.nullable(schema.any({})));

This will probably be a lot more with more complex validation, but we will have to profile the fleet API again to see what specifically is taking up time in that route.

@nchaulet
Copy link
Member

nchaulet commented Jun 4, 2020

I did some tests also and CPU profile while enrolling 250 agents, and the improvments seems to be great, next bottleneck are on our side :)

cpu-profile-after.cpuprofile.zip

These one still seems to be improved, I think it's related to used filter in a saved object query
Screen Shot 2020-06-04 at 10 54 03 AM

@rudolf
Copy link
Contributor Author

rudolf commented Jun 4, 2020

These one still seems to be improved, I think it's related to used filter in a saved object query

Yeah it looks like we're hitting performance bottleneck in the KQL parser, fun times

create() {
return coreContext.configService.atPath(pluginManifest.configPath);
create<T>() {
return coreContext.configService.atPath<T>(pluginManifest.configPath).pipe(shareReplay(1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spaces plugin takes a config value for every getScopedClient call (every incoming request). This does config validation for this config path each time.

const getScopedClient = async (request: KibanaRequest) => {
const [coreStart] = await getStartServices();
const internalRepository = await internalRepositoryPromise;
return config$
.pipe(
take(1),
map((config) => {

We could optimize the plugin to not take a new config value every time but only if a new value was emitted, but this felt like a better fix that will probably benefit other places.

@@ -136,7 +136,7 @@ export class SavedObjectsRepository {
injectedConstructor: any = SavedObjectsRepository
): ISavedObjectsRepository {
const mappings = migrator.getActiveMappings();
const allTypes = Object.keys(getRootPropertiesObjects(mappings));
const allTypes = typeRegistry.getAllTypes().map((t) => t.name);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently Object.entries() which gets called from inside getRootPropertiesObjects is very slow.

Pierre already refactored many places where we use this round about getRootPropertiesObjects way to get all the types, but there's still a few remaining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some getRootPropertiesObjects were kept because of the config 'virtual' type that wasn't properly registered to the registry. But this has been resolved since, so we're safe to remove all the remaining usages.

Very surprised Object.keys is a bottleneck, even if it makes sense that typeRegistry.getAllTypes() is faster as the registry is based on a Map.

Copy link
Contributor

@mshustov mshustov Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's not Object.keys but getRootPropertiesObjects mutating an object?


v8 cannot optimizie this case https://richardartoul.github.io/jekyll/update/2015/04/26/hidden-classes.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I misread and thought it was just a call to getRootProperties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the line that showed up in the profiler and took almost 5ms:

return Object.entries(rootProperties).reduce((acc, [key, value]) => {

const getScopedClient = async (request: KibanaRequest) => {
const [coreStart] = await getStartServices();
const internalRepository = await internalRepositoryPromise;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createScopedRepository already creates a new repository for every request, so after the other improvements it's probably not that critical that we re-use this repository instead of creating a new one. But since this is on the hot path it felt like it doesn't hurt to keep this minor improvement around.

@rudolf rudolf marked this pull request as ready for review June 4, 2020 16:41
@rudolf rudolf requested review from a team as code owners June 4, 2020 16:41
@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.9.0 v8.0.0 labels Jun 4, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@rudolf rudolf added the release_note:skip Skip the PR/issue when compiling release notes label Jun 4, 2020
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -136,7 +136,7 @@ export class SavedObjectsRepository {
injectedConstructor: any = SavedObjectsRepository
): ISavedObjectsRepository {
const mappings = migrator.getActiveMappings();
const allTypes = Object.keys(getRootPropertiesObjects(mappings));
const allTypes = typeRegistry.getAllTypes().map((t) => t.name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some getRootPropertiesObjects were kept because of the config 'virtual' type that wasn't properly registered to the registry. But this has been resolved since, so we're safe to remove all the remaining usages.

Very surprised Object.keys is a bottleneck, even if it makes sense that typeRegistry.getAllTypes() is faster as the registry is based on a Map.

@mshustov
Copy link
Contributor

mshustov commented Jun 5, 2020

Note: There's also a lot of time spent in legacy code that augments the hapi request object with uisettings which is unnecessary, but I'm leaving that as is for now

We are removing this code in v7.10 anyway

Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolf rudolf merged commit 2559dbe into elastic:master Jun 5, 2020
@rudolf rudolf deleted the optimize-saved-objects-api branch June 5, 2020 23:05
@ruflin
Copy link
Contributor

ruflin commented Jun 8, 2020

@rudolf Really excited to see this performance improvement. I wonder if there is a way that we keep "track" of these improvements to make sure not a follow up change in KB accidentally invalidates this optimisation again.

gmmorris added a commit to gmmorris/kibana that referenced this pull request Jun 8, 2020
* master: (57 commits)
  Add app arch team as owner of datemath package (elastic#66880)
  [Observability] Landing page for Observability (elastic#67467)
  [SIEM] Fix timeline buildGlobalQuery (elastic#68320)
  Optimize saved objects getScopedClient and HTTP API (elastic#68221)
  [Maps] Fix mb-style interpolate style rule (elastic#68413)
  update script to always download node (elastic#68421)
  [SECURITY SOLEIL] Fix selection of event type when no siem index signal created (elastic#68291)
  [DOCS] Adds note about configuring File Data Visualizer (elastic#68407)
  [DOCS] Adds link from remote clusters to index patterns (elastic#68406)
  [QA] slack notify on failure (elastic#68126)
  upgrade eslint-plugin-react-hooks from 2.3.0 to 4.0.4 (elastic#68295)
  moving to jira to a gold license (elastic#67178)
  [DOCS] Revises doc on adding data (elastic#68038)
  [APM] Add ThemeProvider to support dark mode (elastic#68242)
  Make welcome screen disabling first action in loginIfPrompted (elastic#68238)
  [QA] Code coverage: unskip tests, collect tests results, exclude bundles from report (elastic#64477)
  [ML] Functional tests - disable flaky regression and classification creation test
  [Alerting] change eventLog ILM requests to absolute URLs (elastic#68331)
  Report page load asset size (elastic#66224)
  [SIEM][CASE] Change SIEM to Security (elastic#68365)
  ...
rudolf added a commit that referenced this pull request Jun 8, 2020
* Create a single repository to be shared by all calls to getScopedClient

* Cache migrator.getActiveMappings to improve createRepository

* Use typeregistry.getAllTypes instead of getRootPropertiesObjects(mappings)

* Don't validate plugin's config every time it's read

* Fix saved_objects_mixin

Co-authored-by: Elastic Machine <[email protected]>
@mshustov
Copy link
Contributor

mshustov commented Jun 9, 2020

@ruflin there was a discussion on GAH to automate perf checks #63848
@LeeDr Do we have a dedicated issue for this? I cannot find any in Kibana-QA Team Roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants