Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Fix agent policy selection and creation when there are a large number of agent policies ( > 1k) #151119

Merged

Conversation

hop-dev
Copy link
Contributor

@hop-dev hop-dev commented Feb 14, 2023

Summary

Closes #150605

When adding an integration, the agent policy selector (create or select existing agent policy) does not work if there are too many agent policies.

In an environment with a large number of agent policies (e.g 1000+) the following API request times out or takes 30 - 60 seconds to respond:

http://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true

That is because for each agent policy we get the saved object twice, get ALL package policies, and count the agents, so for 1000 agents we fun 4000 queries using pmap in some places.

this PR changes it so that we do not get the agent count or full agent policy for every agent policy when populating the agent policy select, instead we only get the selected agent policy.

This has meant that I have to separately query package policies in order to do the limited packages and APM output checks, but the component now loads instantly on an environment with 3k agent policies.

Key changes:

  • add noAgentCount query param to agent policies API, to not count the agents for all agent policies
  • populate agents in the get one agent policy API, this was an inconsistency in our agent policy API
  • when selecting an existing agent policy, only get the full agent policy for the selected agent policy

###Testing

No behaviour should have changed for the agent policy select, some niche behaviour of the selector:

  • if adding the APM integration, any agent policy with a logstash output configured for integration data should be disabled
  • if adding a limited integration e.g apm or defend, any agent policy already containing that integration should be disabled

Screenshot 2023-02-14 at 10 45 03

@hop-dev hop-dev self-assigned this Feb 14, 2023
@hop-dev hop-dev added release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.7.0 v8.8.0 labels Feb 14, 2023
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 927.0KB 927.3KB +288.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 124.3KB 124.3KB +38.0B

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @hop-dev

@hop-dev hop-dev marked this pull request as ready for review February 14, 2023 12:57
@hop-dev hop-dev requested a review from a team as a code owner February 14, 2023 12:57
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

- schema:
type: boolean
in: query
name: noAgentCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more intuitive to have a flag when we want to include agentCount, rather than having a flag to disable. Though this might be a bigger refactor if the API is already used in many places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was so tempted to do it but this change is already larger than I expected. It would be a breaking change to the API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we shouldn't do a breaking change.

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great improvement.

@hop-dev hop-dev merged commit 9a52ef4 into elastic:main Feb 14, 2023
@hop-dev hop-dev deleted the 150605-limit-agent-policies-when-selecting branch February 14, 2023 13:36
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 14, 2023
…e number of agent policies ( > 1k) (elastic#151119)

## Summary

Closes elastic#150605

When adding an integration, the agent policy selector (create or select
existing agent policy) does not work if there are too many agent
policies.

In an environment with a large number of agent policies (e.g 1000+) the
following API request times out or takes 30 - 60 seconds to respond:

```
http://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true
```

That is because for each agent policy we get the saved object twice, get
ALL package policies, and count the agents, so for 1000 agents we fun
4000 queries using `pmap` in some places.

this PR changes it so that we do not get the agent count or `full` agent
policy for every agent policy when populating the agent policy select,
instead we only get the selected agent policy.

This has meant that I have to separately query package policies in order
to do the limited packages and APM output checks, but the component now
loads instantly on an environment with 3k agent policies.

Key changes:

- add `noAgentCount` query param to agent policies API, to not count the
agents for all agent policies
- populate `agents` in the get one agent policy API, this was an
inconsistency in our agent policy API
- when selecting an existing agent policy, only get the full agent
policy for the selected agent policy

###Testing

No behaviour should have changed for the agent policy select, some niche
behaviour of the selector:

- if adding the APM integration, any agent policy with a logstash output
configured for integration data should be disabled
- if adding a limited integration e.g apm or defend, any agent policy
already containing that integration should be disabled

<img width="1052" alt="Screenshot 2023-02-14 at 10 45 03"
src="https://user-images.githubusercontent.com/3315046/218745430-4a8f0ded-1e0b-4319-bc2c-cc5253b4cdd2.png">

(cherry picked from commit 9a52ef4)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.7

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Feb 14, 2023
…a large number of agent policies ( > 1k) (#151119) (#151131)

# Backport

This will backport the following commits from `main` to `8.7`:
- [[Fleet] Fix agent policy selection and creation when there are a
large number of agent policies ( > 1k)
(#151119)](#151119)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Mark
Hopkin","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-02-14T13:36:17Z","message":"[Fleet]
Fix agent policy selection and creation when there are a large number of
agent policies ( > 1k) (#151119)\n\n## Summary\r\n\r\nCloses
#150605\r\n\r\nWhen adding an integration, the agent policy selector
(create or select\r\nexisting agent policy) does not work if there are
too many agent\r\npolicies.\r\n\r\nIn an environment with a large number
of agent policies (e.g 1000+) the\r\nfollowing API request times out or
takes 30 - 60 seconds to
respond:\r\n\r\n```\r\nhttp://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true\r\n```\r\n\r\nThat
is because for each agent policy we get the saved object twice,
get\r\nALL package policies, and count the agents, so for 1000 agents we
fun\r\n4000 queries using `pmap` in some places.\r\n\r\nthis PR changes
it so that we do not get the agent count or `full` agent\r\npolicy for
every agent policy when populating the agent policy select,\r\ninstead
we only get the selected agent policy.\r\n\r\nThis has meant that I have
to separately query package policies in order\r\nto do the limited
packages and APM output checks, but the component now\r\nloads instantly
on an environment with 3k agent policies.\r\n\r\nKey changes:\r\n\r\n-
add `noAgentCount` query param to agent policies API, to not count
the\r\nagents for all agent policies\r\n- populate `agents` in the get
one agent policy API, this was an\r\ninconsistency in our agent policy
API\r\n- when selecting an existing agent policy, only get the full
agent\r\npolicy for the selected agent
policy\r\n\r\n\r\n###Testing\r\n\r\nNo behaviour should have changed for
the agent policy select, some niche\r\nbehaviour of the
selector:\r\n\r\n- if adding the APM integration, any agent policy with
a logstash output\r\nconfigured for integration data should be
disabled\r\n- if adding a limited integration e.g apm or defend, any
agent policy\r\nalready containing that integration should be
disabled\r\n\r\n<img width=\"1052\" alt=\"Screenshot 2023-02-14 at 10 45
03\"\r\nsrc=\"https://user-images.githubusercontent.com/3315046/218745430-4a8f0ded-1e0b-4319-bc2c-cc5253b4cdd2.png\">","sha":"9a52ef4baf07baa8a877252e3a740cfde1624063","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","v8.7.0","v8.8.0"],"number":151119,"url":"https://github.com/elastic/kibana/pull/151119","mergeCommit":{"message":"[Fleet]
Fix agent policy selection and creation when there are a large number of
agent policies ( > 1k) (#151119)\n\n## Summary\r\n\r\nCloses
#150605\r\n\r\nWhen adding an integration, the agent policy selector
(create or select\r\nexisting agent policy) does not work if there are
too many agent\r\npolicies.\r\n\r\nIn an environment with a large number
of agent policies (e.g 1000+) the\r\nfollowing API request times out or
takes 30 - 60 seconds to
respond:\r\n\r\n```\r\nhttp://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true\r\n```\r\n\r\nThat
is because for each agent policy we get the saved object twice,
get\r\nALL package policies, and count the agents, so for 1000 agents we
fun\r\n4000 queries using `pmap` in some places.\r\n\r\nthis PR changes
it so that we do not get the agent count or `full` agent\r\npolicy for
every agent policy when populating the agent policy select,\r\ninstead
we only get the selected agent policy.\r\n\r\nThis has meant that I have
to separately query package policies in order\r\nto do the limited
packages and APM output checks, but the component now\r\nloads instantly
on an environment with 3k agent policies.\r\n\r\nKey changes:\r\n\r\n-
add `noAgentCount` query param to agent policies API, to not count
the\r\nagents for all agent policies\r\n- populate `agents` in the get
one agent policy API, this was an\r\ninconsistency in our agent policy
API\r\n- when selecting an existing agent policy, only get the full
agent\r\npolicy for the selected agent
policy\r\n\r\n\r\n###Testing\r\n\r\nNo behaviour should have changed for
the agent policy select, some niche\r\nbehaviour of the
selector:\r\n\r\n- if adding the APM integration, any agent policy with
a logstash output\r\nconfigured for integration data should be
disabled\r\n- if adding a limited integration e.g apm or defend, any
agent policy\r\nalready containing that integration should be
disabled\r\n\r\n<img width=\"1052\" alt=\"Screenshot 2023-02-14 at 10 45
03\"\r\nsrc=\"https://user-images.githubusercontent.com/3315046/218745430-4a8f0ded-1e0b-4319-bc2c-cc5253b4cdd2.png\">","sha":"9a52ef4baf07baa8a877252e3a740cfde1624063"}},"sourceBranch":"main","suggestedTargetBranches":["8.7"],"targetPullRequestStates":[{"branch":"8.7","label":"v8.7.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.8.0","labelRegex":"^v8.8.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/151119","number":151119,"mergeCommit":{"message":"[Fleet]
Fix agent policy selection and creation when there are a large number of
agent policies ( > 1k) (#151119)\n\n## Summary\r\n\r\nCloses
#150605\r\n\r\nWhen adding an integration, the agent policy selector
(create or select\r\nexisting agent policy) does not work if there are
too many agent\r\npolicies.\r\n\r\nIn an environment with a large number
of agent policies (e.g 1000+) the\r\nfollowing API request times out or
takes 30 - 60 seconds to
respond:\r\n\r\n```\r\nhttp://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true\r\n```\r\n\r\nThat
is because for each agent policy we get the saved object twice,
get\r\nALL package policies, and count the agents, so for 1000 agents we
fun\r\n4000 queries using `pmap` in some places.\r\n\r\nthis PR changes
it so that we do not get the agent count or `full` agent\r\npolicy for
every agent policy when populating the agent policy select,\r\ninstead
we only get the selected agent policy.\r\n\r\nThis has meant that I have
to separately query package policies in order\r\nto do the limited
packages and APM output checks, but the component now\r\nloads instantly
on an environment with 3k agent policies.\r\n\r\nKey changes:\r\n\r\n-
add `noAgentCount` query param to agent policies API, to not count
the\r\nagents for all agent policies\r\n- populate `agents` in the get
one agent policy API, this was an\r\ninconsistency in our agent policy
API\r\n- when selecting an existing agent policy, only get the full
agent\r\npolicy for the selected agent
policy\r\n\r\n\r\n###Testing\r\n\r\nNo behaviour should have changed for
the agent policy select, some niche\r\nbehaviour of the
selector:\r\n\r\n- if adding the APM integration, any agent policy with
a logstash output\r\nconfigured for integration data should be
disabled\r\n- if adding a limited integration e.g apm or defend, any
agent policy\r\nalready containing that integration should be
disabled\r\n\r\n<img width=\"1052\" alt=\"Screenshot 2023-02-14 at 10 45
03\"\r\nsrc=\"https://user-images.githubusercontent.com/3315046/218745430-4a8f0ded-1e0b-4319-bc2c-cc5253b4cdd2.png\">","sha":"9a52ef4baf07baa8a877252e3a740cfde1624063"}}]}]
BACKPORT-->

Co-authored-by: Mark Hopkin <[email protected]>
justinkambic pushed a commit to justinkambic/kibana that referenced this pull request Feb 23, 2023
…e number of agent policies ( > 1k) (elastic#151119)

## Summary

Closes elastic#150605

When adding an integration, the agent policy selector (create or select
existing agent policy) does not work if there are too many agent
policies.

In an environment with a large number of agent policies (e.g 1000+) the
following API request times out or takes 30 - 60 seconds to respond:

```
http://localhost:5601/mark/api/fleet/agent_policies?page=1&perPage=10000&sortField=name&sortOrder=asc&full=true
```

That is because for each agent policy we get the saved object twice, get
ALL package policies, and count the agents, so for 1000 agents we fun
4000 queries using `pmap` in some places.

this PR changes it so that we do not get the agent count or `full` agent
policy for every agent policy when populating the agent policy select,
instead we only get the selected agent policy.

This has meant that I have to separately query package policies in order
to do the limited packages and APM output checks, but the component now
loads instantly on an environment with 3k agent policies.

Key changes:

- add `noAgentCount` query param to agent policies API, to not count the
agents for all agent policies
- populate `agents` in the get one agent policy API, this was an
inconsistency in our agent policy API
- when selecting an existing agent policy, only get the full agent
policy for the selected agent policy


###Testing

No behaviour should have changed for the agent policy select, some niche
behaviour of the selector:

- if adding the APM integration, any agent policy with a logstash output
configured for integration data should be disabled
- if adding a limited integration e.g apm or defend, any agent policy
already containing that integration should be disabled

<img width="1052" alt="Screenshot 2023-02-14 at 10 45 03"
src="https://user-images.githubusercontent.com/3315046/218745430-4a8f0ded-1e0b-4319-bc2c-cc5253b4cdd2.png">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.7.0 v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Adding integration to agent policy does not work when there are a large number of agent policies
5 participants