Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for processors in light modules #14740

Closed
jsoriano opened this issue Nov 25, 2019 · 9 comments
Closed

Add support for processors in light modules #14740

jsoriano opened this issue Nov 25, 2019 · 9 comments
Assignees
Labels
enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@jsoriano
Copy link
Member

jsoriano commented Nov 25, 2019

Extend light modules implemented in #12270 to support processors in their manifests. These processors could be used to modify the events before sending them to the outputs. This can be useful in many use cases where some extra enrichment or mapping would need to be done.

Processors would be defined like that in the manifest.yml:

default: true
input:
  module: prometheus
  metricset: collector
  defaults:
    metrics_path: /_status/vars
processors:
  - rename:
      fields:
        - from: prometheus.labels.instance
          to: cockroachdb.instance
      ignore_missing:true

And they would need to be executed before processors defined in the configuration.

Some insights on possible ways to implementing this:

When a new module is going to be created from a configuration (in metricbeat/mb/module/factory.go), a runner is created from a new connector and a new wrapper. Light modules are instantiated in the wrapper, but processors are initialized in the new connector, so there is no way at the moment to add processors where the light modules are loaded.

There are some ways to approach the solution of this:

  • Expose a way to add processors to the connector from the module factory, this way the connection would be created first, and it would expose something to add processors, this something would be passed to the module creator that could prepend its processors. This solution can add an artificial dependency between modules initialization and pipelines.
  • Where we do all the overrides in metricbeat/mb/lightmetricset.go we could create a metricset wrapper that applies the processors to any event generated before reporting it. This may require to create a wrapper for each one of the reporter interfaces we have. This may also force to do normalization of events twice, as processors need to work on normalized events.
  • A variant of the previous point, that could be useful in other metricsets would be to add a Transformer interface, with a Transform(event) method that can be implemented by metricsets, in the case of light metricsets with processors it could be implemented by a wrapper and would execute the processors for the given event. Metricsets implementing this interface would receive a different reporter in metricSetWrapper.run(), one that applies the transformations before actually sending the event to the output.
@urso
Copy link

urso commented Jan 23, 2020

fyi; Filebeat uses the connector approach (inputs must connect to the pipeline). See: https://github.com/elastic/beats/blob/master/filebeat/channel/connector.go#L50

The Pipeline/PipelineConnector are interfaces and can be easily wrapped. The connector in filebeat merges global settings with the connection settings given by the inputs.

The connector approach guarantees that processors are merged with libbeat internal other other configured processors into a sane 'pipeline'. E.g. event normalization is guaranteed to run first.

@jsoriano
Copy link
Member Author

jsoriano commented Jan 23, 2020

Metricbeat also uses a pipeline connector, that configures the processors provided in the configuration.

But this connector doesn't have visibility on the configuration provided in the lightweight module manifest, nor the module wrapper that has access to the manifest has visibility on the pipeline.

Connector and wrapper are initialized here: https://github.com/elastic/beats/blob/v7.5.2/metricbeat/mb/module/factory.go#L46

We would need to somehow connect the wrapper with the connector there. Either exposing a way to add processors from the wrapper, or initializing first the wrapper, and making it expose additional configurations that would be merged with the configuration passed to the connector.

Another different approach would be to create a different processors pipeline in the wrapping we do for light modules here: https://github.com/elastic/beats/blob/v7.5.2/metricbeat/mb/lightmetricset.go#L61 We could wrap the Fetch so it consists on fetching using the real module/input and then run the processors before sending the event to the actual pipeline.

@urso
Copy link

urso commented Jan 23, 2020

Another different approach would be to create a different processors pipeline in the wrapping we do for light modules here: /metricbeat/mb/[email protected]#L61 We could wrap the Fetch so it consists on fetching using the real module/input and then run the processors before sending the event to the actual pipeline.

This would change how processors are merged with internal processors (execution order might change and fields could be overwritten in an unexpected order). Plus event normalization is supposed to run before the actual processors. Running processors before passing to the pipeline requires you to add event normalization as well.

@jsoriano
Copy link
Member Author

Another different approach would be to create a different processors pipeline in the wrapping we do for light modules here: /metricbeat/mb/[email protected]#L61 We could wrap the Fetch so it consists on fetching using the real module/input and then run the processors before sending the event to the actual pipeline.

This would change how processors are merged with internal processors (execution order might change and fields could be overwritten in an unexpected order). Plus event normalization is supposed to run before the actual processors. Running processors before passing to the pipeline requires you to add event normalization as well.

This is a good point, doing normalization twice doesn't sound good.

@mtojek mtojek self-assigned this Jan 24, 2020
@andresrc andresrc added Team:Services (Deprecated) Label for the former Integrations-Services team [zube]: Inbox [zube]: In Progress and removed [zube]: In Progress [zube]: Inbox labels Jan 27, 2020
@mtojek
Copy link
Contributor

mtojek commented Jan 28, 2020

I looked into the mb source code and here is the overview of changes to be applied (to my mind):

  1. Define processors (if needed) in manifest.yml for light metricsets.
  2. metricbeat/mb/module/factory.go: Wrapper reads all processors defined for metricsets.
  3. metricbeat/mb/module/factory.go: Create connector per metricset (3 metricsets = 3 connectors). Also, pass processors specific to the metricset.
  4. metricbeat/mb/module/factory.go: Create clients for every connector.
  5. metricbeat/mb/module/runner.go: Pass multiple clients to PublishChannels()
  6. metricbeat/mb/module/publish.go: Based on the event.Fields.metricset route the event to the proper client.

I think this is the shortest path to support processors in light modules and it doesn't look like a huge refactoring.

@jsoriano is it consistent with your approach or do you prefer to discuss it?

@jsoriano
Copy link
Member Author

@mtojek it LGTM, it'd be good to have a POC, if you start working on it please ping me when you some working code.

Create connector per metricset (3 metricsets = 3 connectors). Also, pass processors specific to the metricset.

This is a good point, each metricset can have different processors.

Thanks!

@mtojek
Copy link
Contributor

mtojek commented Jan 31, 2020

All PRs merged. Resolving.

@sorantis
Copy link
Contributor

@jsoriano does everything described here apply to light modules as well?

@jsoriano
Copy link
Member Author

@sorantis yes, light modules can use any of these processors after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

No branches or pull requests

5 participants