Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Refactor general Beats settings #20679

Closed
mostlyjason opened this issue Aug 18, 2020 · 8 comments
Closed

[Elastic Agent] Refactor general Beats settings #20679

mostlyjason opened this issue Aug 18, 2020 · 8 comments
Assignees
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@mostlyjason
Copy link

mostlyjason commented Aug 18, 2020

Today Beats offers several global and general configuration options https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-general-options.html. As we move to a single Elastic Agent policy, we want to refactor these settings and decide what makes sense to have at the Agent level, the process level, and the input level. I've combined input from @urso and @ruflin on the description below:

Lets take seccomp as an example: Will seccomp be per process or do we have a global seccomp? seccomp rules must be enforced at process level. If we create input configurations with different seccomp settings, then the Agent would need to create a process per 'profile'.

Another example for metricbeat is timeseries.enabled. This seems to be a very input specific config but if one metric input has it on and the other one off, we need to run two processes. Why is this setting enforced at a global level already? The setting just changes how events are reported. Do we really need this, or will an add_fields processor do the trick?

Overall, I think we need a general approach on how we handle these. I think a few like seccomp must become a global feature in the Agent. The others which are process specific, we probably need an expert mode where someone can specify these in the config and name the actual process. Something like:

process.metric.timeseries.enabled: true where process.metric can be mapped to either metricbeat or the collector in the future. This will allow us to migrate the backend to the collector without breaking changes.

Besides global settings we have 'internal' behavior of Beats that we might want to expose as settings. E.g. Filebeat sets up infinite retry for it's inputs by default. Actually inputs can overwrite this, but we don't expose any setting to do so. Creating a PR to introduce a per input setting like send_mode: guaranteed/retry will be quite straight forward for filebeat. Are we interested in a setting like this for Metricbeat as well?

Suggestions for global settings we have for all Beats:

  • tags/fields/fields_under_root: Lets remove it. Same goes for the per input fields setting. We already have the add_fields processor, which when used directly also makes it obvious when the custom fields are added
  • name: I'd prefer to remove it. On the other hand it is used to configure 'host.name'. But in general it would be better to use a generated ID (generated by Fleet) to uniquely identify the agent the event has been collected from.
  • processors: I'm in favor of removing global processor settings. Drawback would be that we'd need to add the add_x_metadata processors per input then (which can be an advantage, as not all inputs/integrations want these to be enabled by default). Currently one reason to have them global is also efficiency (multiple instances would mean multiple lookups). But this more or less pointing out a problem in our architecture and should not be solved at the configuration file level.
  • max_procs: This we should keep. Maybe even set it to 1 or 2 by default. For the collector I'm thinking of introducing a 'limits' namespace for global settings that configure resource limitations. Some possible settings we would add to the 'limits' namespace are (better names are welcome):
    • "memory":Configure limit of MemoryCircuiteBreaker, that blocks inputs once enqueued events have reached an estimated memory usage
    • "harvesters" (not sure about a good name or how broad we want to apply the setting): Log file collection is "dynamic" and itself discovers files to collect. In order to protect resource usage we need a global limit that is applied to all active log inputs (especially regarding auto-discovery for docker/k8s)
    • scheduler: Heartbeat has a task scheduler that can limit the number of concurrent active Tasks. Long term it would be nice to also limit concurrent metricbeat tasks as well. This is also where the start_delay setting might be moved too.

The others which are process specific, we probably need an expert mode where someone can specify these in the config and name the actual process. Something like: process.metricbeat.timeseries.enabled: true. I'd prefer to to introduce an expert mode like this. On the other hand I'm not sure where/how to apply resource limit settings. Interestingly our output settings define/translate to named 'pipelines' internally. Although the namespace is output, to me it feels quite natural to apply limit settings to the per pipeline settings.

@mostlyjason mostlyjason added Team:Services (Deprecated) Label for the former Integrations-Services team Team:Ingest Management labels Aug 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@mostlyjason mostlyjason changed the title [Elastic Agent] Refactor global and general Beats settings [Elastic Agent] Refactor general Beats settings Aug 18, 2020
@ruflin ruflin unassigned urso and ruflin Aug 19, 2020
@urso
Copy link

urso commented Aug 19, 2020

process.metricbeat.timeseries.enabled: true

Why do we need this to be a global setting? Why is it a global setting in metricbeat? Is the future to have it always enabled?

@ruflin ruflin self-assigned this Sep 18, 2020
@ruflin
Copy link
Contributor

ruflin commented Sep 22, 2020

I've been thinking about this again and I'm hesitant to implement a generic way on how to support the global configs. I think we should tackle it one by one when we have a need / request for a certain feature and if we see a pattern after 2-3 changes, we should find a generic solution instead of doing it upfront.

I went again through all existing config options and I think there are cases, where we even come up with new solutions. Variables in the config and keystore are two good examples here. With the "dynamic input" change, both of these are likely to be supported in the future.

So my proposal is to close this issue and open specific issues for configs that are not supported and need to support so we can discuss one by one.

@ph
Copy link
Contributor

ph commented Sep 22, 2020

@ruflin I agree with that.

So +1 to close it and let it drive by user requests/need. This also reduce our cognitive load to an MVP/GA.

@ruflin
Copy link
Contributor

ruflin commented Sep 23, 2020

@exekias @andresrc WDYT? Closing ok for you?

@exekias
Copy link
Contributor

exekias commented Sep 23, 2020

SGTM

@andresrc
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

No branches or pull requests

7 participants