Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add support for custom ingest pipeline to integrations #133740

Closed
16 of 18 tasks
joshdover opened this issue Jun 7, 2022 · 22 comments
Closed
16 of 18 tasks

[Fleet] Add support for custom ingest pipeline to integrations #133740

joshdover opened this issue Jun 7, 2022 · 22 comments
Assignees
Labels
enhancement New value added to drive a business result QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@joshdover
Copy link
Contributor

joshdover commented Jun 7, 2022

In order to support user customizations to how data is processed from integrations, we will add support to Integration packages for adding an optional, custom ingest pipeline that is executed after the package's data stream pipeline on each data stream. This custom pipeline can container processors directly or use the pipeline processor to call other pipelines that can be shared across integrations.

In this initial phase, we'll also add some UX entry points in the default integration policy editor for accessing the pipeline and custom mappings editors. This will lay the groundwork for future workflow enhancements around data customization, enrichment, and processing.

Implementation plan

Add support for @custom pipelines to all integration data streams

Depends on:

In this initial step, we'll update the EPM installation code to append a new pipeline processor to the end of any ingest pipeline that is installed by the integration. This processor will reference a pipeline with the naming convention <type>-<dataset>@custom with the ignore_missing_pipeline: true option set.

We should also install a default pipeline that only includes this pipeline processor. for every data streams that do not define an ingest pipeline at all.

The <type>-<dataset>@custom pipeline that is referenced should not be created during installation, removed during upgrades, or removed during uninstallation. In other words, EPM does not directly touch these pipelines at all.

Example of what this would look like for the logs-nginx.access-* data streams' pipeline:
Screen Shot 2022-06-03 at 15 04 53

Enhancements to Stack Management

There are some changes we need to improve the UX of editing mappings and these custom ingest pipelines. These may or may not be done by the Fleet UI team, as these are currently owned by @elastic/platform-deployment-management team.

Ingest Pipeline UI

Component template editor

Add entry points to pipeline and mapping editors from integration policy editor

We will guide the user towards creating or editing these pipelines and their associated mappings from the integration policy editor, under each data stream's "Advanced options" section. This includes adding a table that displays the data stream's pipeline and mappings defined by the package as well as the custom pipeline (if created) and custom component template.

These new components should be built in a way that can also be reused in other custom policy editors, like APM and Endpoint.

image

Detailed Requirements:

General changes

  • Add a warning UX if the user tries to navigate away from the policy editor without saving the changes. This should use the history.block helper from Core's application service

Ingest pipelines table

PR is here #134760

  • When no custom pipeline is defined, show a table with a single row for the data stream's default pipeline. This should include a link to view the pipeline by redirecting to /app/management/ingest/ingest_pipelines/?pipeline=<pipeline name>
  • When no custom pipeline is defined, there should be a link to "Add custom pipeline" which redirects to /app/management/ingest/ingest_pipelines/create?name=<type>-<dataset>@custom&redirect_path=/app/fleet/policies/<uuid>/edit-integration/<uuid>
  • When a custom pipeline is alredy defined, the edit button should redirect to /app/management/ingest/ingest_pipelines/edit/<type>-<dataset>@custom?redirect_path=/app/fleet/policies/<uuid>/edit-integration/<uuid>

Mappings table

It's important to note that today, we always create @Custom component templates for mappings and settings overrides. We should explore changing this in the future, but it is not considered in scope for this change.

  • Show a table with a row for the data stream's @package component template and the @custom component template. These should include a link to view the template by redirecting to /app/management/data/index_management/component_templates/<template name>
  • The edit button on the @custom template should redirect to /app/management/data/index_management/edit_component_template/<type>-<dataset>@custom?tab=mappings&redirect_path=/app/fleet/policies/<uuid>/edit-integration/<uuid>

Optional enhancements

  • For the "view" buttons, it'd be nice to show the same flyout that is displayed in the Ingest Pipelines table and Component Templates table in Stack Management, rather than link out to a separate app. This likely requires refactoring.

Deferred

These features will further enhance custom ingest pipelines, but are planned to be implemented separately from this initial effort:

Questions

  • How do we handle "top-level" pipelines? What should the name be? whatever-the-pipename-is@custom?
    Top level pipeline is only used by a ML integration and not related to any datastream , so it probably make no sense to add a custom pipeline there.
@joshdover joshdover added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team labels Jun 7, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ruflin
Copy link
Contributor

ruflin commented Jun 7, 2022

Few comments

  • As there might be @Custom pipelines idling around, it would be nice to provide some tooling for users to find these and do a cleanup
  • "How do we handle "top-level" pipelines?": What are these used for?
  • Integrations without pipeline: ++ on installing one by default with the linking to custom inside.
  • Even if we don't have the UI at first, this feature could still be shipped
  • Hide custom pipelines UI by default: Most users should not have to use this, lets make sure it as hidden as possible

@nchaulet
Copy link
Member

nchaulet commented Jun 8, 2022

@joshdover How do we plan the migration from 8.3.0 to 8.4.0 to work with that?

  • do we want to reinstalll al the ingest pipeline
  • or it's acceptable to document that user will have to reinstall the integration to get that feature (we should have a reinstall button in the UI in 8.4 :) )

@jen-huang
Copy link
Contributor

@nchaulet We do have the latter as a papercut item :) #129318

@joshdover
Copy link
Contributor Author

How do we plan the migration from 8.3.0 to 8.4.0 to work with that?

I think we can rely on the reinstall workaround + documentation until we solve #121099. Maybe we list #121099 as a optional stretch goal?

@ruflin
Copy link
Contributor

ruflin commented Jun 9, 2022

I like the idea of the "manual" upgrade. We could use this also in other places. Instead of magically rolling over / upgrading we could should users a manual upgrade button on these packages.

@cjcenizal
Copy link
Contributor

@nchaulet The only bit here that sounds odd to me is this requirement for the Component Template editor:

(UX definition needed, may be deferred) Add some UX for helping the user apply their mappings or setting changes to their data stream on save.

There is no direct connection between a component template and a data stream at the ES level. A component template's settings etc are applied to a new data stream when it's created, and then that link is discarded -- you can't inspect a data stream to determine which component template created it (AFAIK). So I believe this direct link between components templates and data streams is a new concept that's been introduced at the integration level.

If all of this is correct so far, then I suggest the following requirements, in order to clarify this relationship:

  • Provide the user with some copy to describe the link, e.g. "This component template is part of the XYZ integration, and was used to create the A, B, and C data streams."
  • Explain the consequences of not applying the changes. For example, "Subsequent data streams that are created might diverge from A, B, and C, which might prevent you from searching or visualizing across them."

@adriansr
Copy link

adriansr commented Jun 16, 2022

custom ingest pipeline that is executed after the package's data stream pipeline on each data stream

Out of curiosity, shouldn't the index.final_pipeline setting be used for this instead of injecting a pipeline processor in the existing pipeline?

@joshdover
Copy link
Contributor Author

joshdover commented Jun 16, 2022

@nchaulet Whenever we open the docs issue for this feature, let's have the docs for this added to or linked from this page https://www.elastic.co/guide/en/fleet/current/data-streams.html

@nchaulet
Copy link
Member

Out of curiosity, shouldn't the index.final_pipeline setting be used for this instead of injecting a pipeline processor in the existing pipeline?

@adriansr we already use a final pipeline shared between all the datastream created by fleet (that set event.ingested and verify the agent id against the API key)

@joshdover
Copy link
Contributor Author

@cjcenizal

There is no direct connection between a component template and a data stream at the ES level. A component template's settings etc are applied to a new data stream when it's created, and then that link is discarded -- you can't inspect a data stream to determine which component template created it (AFAIK). So I believe this direct link between components templates and data streams is a new concept that's been introduced at the integration level.

I think there is a way to do this in a generic way, though it'd likely require some combination of:

  • First determine all index templates that this component template is used by (this logic already exists in the /api/index_management/component_templates route)
  • Using the GET /_data_stream/<template> to find matching data streams for each template
  • Leveraging the template field on the GET /_data_stream API to determine which template created them
  • Verifying that the template priority isn't overridden by another template that doesn't using this component template

It feels like duplicating quite a bit of logic that ES maintains and you'd need to be careful about the priority and wildcard resolution to be sure that it doesn't diverge from what ES does during the matching process.

All that said, I think we can take advantage of our well-known structure here for integration templates and only apply the mappings (and rollover, if required) to the data streams that we know should be associated to the component template. This seems safer in this first iteration over building a more generic feature.

If all of this is correct so far, then I suggest the following requirements, in order to clarify this relationship:

  • Provide the user with some copy to describe the link, e.g. "This component template is part of the XYZ integration, and was used to create the A, B, and C data streams."

Here's the current mockup we're proposing:
image

I think we can improve this with something like what you suggested around including the integration name, which we can pull directly from the _meta fields we add on the template.

  • Explain the consequences of not applying the changes. For example, "Subsequent data streams that are created might diverge from A, B, and C, which might prevent you from searching or visualizing across them."

The new proposal will always try to apply the changes when saving the template, unless a rollover is required, which will require explicit user action. Maybe we should improve this modal with more explanation of that tradeoff if they choose not to rollover?

@jen-huang
Copy link
Contributor

@nchaulet Is there anything left for this feature before we close this issue out?

@nchaulet
Copy link
Member

@jen-huang No just closed the last PR for that today, and there is an issue for doc already (where I need to had more info)

@amolnater-qasource
Copy link

Hi @nchaulet @joshdover

Could you please share some more detailed information for the feature, like use case and how it will enhance user experience.

This custom pipeline can container processors directly or use the pipeline processor to call other pipelines that can be shared across integrations.

Could you please explain us the working of pipeline as mentioned in above statement.

Further, we would be requiring more guidelines for feature validation.

Thanks

@nchaulet
Copy link
Member

Could you please share some more detailed information for the feature, like use case and how it will enhance user experience.

Hi @amolnater-qasource this feature allow a user to edit custom ingest pipeline and custom mappings for a datastream from the package policy editor under the advanced section.

I recorded a small demo of the feature here maybe it could help you to better understand the feature let me know if you have more questions,

Loom.Message.-.19.July.2022.mp4

@amolnater-qasource
Copy link

Hi @nchaulet

Thank you for all the information and sharing a demo recording for the feature testing.

We have revalidated this feature on latest 8.4 Snapshot and had below observations:

  • We are able to create custom pipeline.
  • We attempted to modify custom mappings however we are not getting added custom pipeline.

Could you please confirm if this is an issue?

Screen Recording:

Edit.integration.-.Agent.policy.1.-.Agent.policies.-.Fleet.-.Elastic.-.Google.Chrome.2022-07-25.17-34-17.mp4

Build details:
BUILD: 54789
COMMIT: af3a3cb

Please let us know if we are missing anything here.
Thank You!

@nchaulet
Copy link
Member

Hi @amolnater-qasource yes you should not get anything in custom mapping it's up to the user to add custom mappings after adding a custom pipeline..

@amolnater-qasource
Copy link

Hi @nchaulet
Thank you for the confirmation on the shared scenario.
We will be creating our test content on the basis all the information shared above.

Thanks

@amolnater-qasource
Copy link

Hi Team
We have created 04 testcases for this feature under our Fleet Test Suite at links:

Please let us know if any other scenario is required to be covered from our end.
Thanks

@amolnater-qasource
Copy link

Hi Team

We have executed 04 testcases for this feature under our Fleet Test run at link:

Build details:

VERSION: 8.4.0-BC3
BUILD: 55281
COMMIT: e42c547d7ab545472fd978383c2c43fa203a5b06

As the testing is completed on this feature, we are marking it as QA:Validated.

Thanks

@llermaly
Copy link

llermaly commented Nov 4, 2022

@joshdover how do @Custom mappings interact with existing index templates for the integration index pattern? Will apply this custom mappings at the beginning or the end of the components chain?

@joshdover
Copy link
Contributor Author

joshdover commented Nov 4, 2022

@llermaly The @custom component templates are after the @package template in the components list, so they can override any mappings or index settings that are supplied by the package. That said, overriding the package's mappings is not generally recommended as it will likely break dashboards and other features (eg. security alerts) that are shipped with the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

9 participants