Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change Proposal] Add processor descriptions to Elastic-provided ingest node pipelines #807

Closed
cjcenizal opened this issue Mar 24, 2021 · 11 comments
Labels

Comments

@cjcenizal
Copy link

Problem

There's a new-ish feature in the Ingest Node Pipelines UI that allows you to define a description for each individual processor in a pipeline (elastic/elasticsearch#57906). A Kibana user has recently complained that none of the pipelines we ship use this feature, which makes it difficult for them to understand what a pipeline is doing (elastic/kibana#94432). We have an opportunity to provide a more pleasant and productive UX to our users by updating our Elastic-provided pipelines with processor descriptions. Here's a visual example of the opportunity:

image

Solution

This solution requires updating the pipelines provided by Elastic with description content for each processor. This description will help users understand the purpose of a processor in the context of its pipeline.

@cjcenizal
Copy link
Author

cjcenizal commented Mar 24, 2021

Related work is tracked in the ES repo: elastic/elasticsearch#70442

@cjcenizal
Copy link
Author

CC @ycombinator

@ycombinator
Copy link
Contributor

ycombinator commented Mar 24, 2021

In the context of Elastic Packages, this change would mean requiring that any ingest node pipelines defined by a package must specify descriptions for all their processors. Technically, this isn't difficult to do so I will only highlight the challenges we need to manage:

  1. At the moment the package specification does not define a specification for ingest pipelines. It only goes as far as saying, "if a package needs to define ingest pipelines, they must be placed under the <package root>/data_stream/<data stream>/elasticsearch/ingest_pipeline/ folder" and "either yml-formatted or json-formatted pipeline definitions are allowed".

    Notably, there are no specifications for the actual ingest pipeline structure itself. This is deliberate because we want to treat stack asset structure as black boxes as much as possible, allowing stack components to vary the structure definitions freely without requiring the package specification to know about it and stay in sync.

    To implement this proposal, we will need to make an exception and define at least a partial structure for the actual contents of an ingest pipeline, enough to provide a "required" validation check for ingest pipeline processor descriptions.

    But that's okay IMO. With this change, we would be coupling ourselves a bit more tightly with Elasticsearch; if the structure of an ingest pipeline is changed or enhanced in the future, we will need to keep track of such changes and apply the equivalent updates to the package spec as well. Practically speaking, though, I don't foresee this to be a big deal — the structure of ingest pipelines has not changed often IME.

  2. By making ingest pipeline processor descriptions required, existing packages that define ingest pipelines are likely to break validation. This is to be expected so we'll have to take care of rolling out this change carefully, giving package maintainers plenty of heads up and time to define descriptions for their ingest pipeline processors. We may want to periodically run the validation with this proposal's PR against existing packages and report out on which packages need updating to the respective owners. We would keep doing this until all packages pass validation; then we can safely merge the PR for this change.

cc: @mtojek @exekias @masci @ruflin

@ruflin
Copy link
Contributor

ruflin commented Mar 25, 2021

As far as I understand, this is an optional field in Elasticsearch ingest pipeline so I think it should also be an optional field in the package spec. It would be overkill to require every package creator to have a description for each processor especially in the context of community package. Otherwise we will see there a lot of - as descriptions.

But I agree with the goal, that our packages should make use of it. It will not only help users to understand what a processor does but also share knowledge across contributors. It is interesting that we switched over to yaml for many ingest pipelines to be able to have "comments" in our pipelines on what they do. Seems like some of these could be ported over to description now.

What I suggest is that instead of start a discussion in "integrations" on how we should add this to the general flow of creating integrations and start using it. I would not make any change in the package-spec.

@exekias
Copy link

exekias commented Mar 25, 2021

Having that we are not checking the content of pipelines I think it's fine to keep things as they are in the spec, as adding descriptions is supported 😄 . Long term I could see how a linter could throw a warning for missing descriptions, instead of an error.

For now I'm +1 on moving this conversation to integrations, perhaps we can update the guidelines and raise awareness?

@ycombinator
Copy link
Contributor

Thanks for weighing in @ruflin and @exekias. Moving this issue to the integrations repo...

@ycombinator ycombinator transferred this issue from elastic/package-spec Mar 25, 2021
@ycombinator
Copy link
Contributor

I'm ++ for the immediate action to be adding a recommendation in the package contributing documentation, specifically this section, about adding descriptions to ingest pipeline processors.

@mtojek as part of working on elastic/elastic-package#3, do you plan to create a sample/fake Elasticsearch ingest pipeline? If so, maybe that could include a sample processor with a description field that says "TODO" or something?

Also ++ on longer term having elastic-package lint be able to show warnings (right now it either passes or fails), and one of these warnings could be when it detects ingest pipeline processors with no descriptions. I've created elastic/elastic-package#298 to track this.

@andrewkroh
Copy link
Member

I think for the majority of the processors it would be possible to generate a useful description if one is not specified in the pipeline. For example, a convert processor for source.port to long could show "convert source.port to long" which provides a lot of context when viewing the pipeline. As package developers I think we would find it difficult to maintain descriptions like this if we put them into every pipeline.

In pipelines we should do more to provide descriptions for things like script processors that can be doing anything.

@cjcenizal
Copy link
Author

I've created elastic/kibana#95486 to track the enhancement of dynamically generating simple descriptions for processors where one hasn't been defined.

@andrewkroh
Copy link
Member

Kibana now has the generated descriptions for processors.

Also ++ on longer term having elastic-package lint be able to show warnings (right now it either passes or fails), and one of these warnings could be when it detects ingest pipeline processors with no descriptions.

If we get the warning capability I think we should add a warning for missing script processor description fields since it just has a generic "Runs a script on incoming documents." value.

@botelastic
Copy link

botelastic bot commented Sep 19, 2022

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Sep 19, 2022
@botelastic botelastic bot closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants