Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk create multiple ingest pipelines #35889

Closed
ycombinator opened this issue Nov 26, 2018 · 6 comments
Closed

Bulk create multiple ingest pipelines #35889

ycombinator opened this issue Nov 26, 2018 · 6 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@ycombinator
Copy link
Contributor

In the same vein as #35495, @ruflin wondered if Elastisearch could support creating multiple ingest pipelines at once, in bulk, in an all-or-nothing fashion. Filebeat may soon support the loading of multiple ingest pipelines, so this would be a useful feature to have in Elasticsearch.

@ycombinator ycombinator added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Nov 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@ruflin
Copy link
Contributor

ruflin commented Nov 27, 2018

I think this is especially valuable in the context of the pipeline processors where multiple pipelines are needed: https://www.elastic.co/guide/en/elasticsearch/reference/master/pipeline-processor.html

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dakrone
Copy link
Member

dakrone commented May 17, 2024

This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
@ruflin
Copy link
Contributor

ruflin commented May 21, 2024

@amitkanfer @kpollich For awareness. This feature would also have been useful for Fleet that sometimes installs quite a few ingest pipelines. It could speed up and simplify the package installation process.

@amitkanfer
Copy link

@kpollich how useful this would be for fleet? i mean... what's the heaviest integration this will be useful for, and how much time will it save during installation?
also - do we have an estimation for the complexity / effort of implementing this in ES?

@kpollich
Copy link
Member

@kpollich how useful this would be for fleet? i mean... what's the heaviest integration this will be useful for, and how much time will it save during installation?
also - do we have an estimation for the complexity / effort of implementing this in ES?

Heaviest integrations are probably things like AWS with many, many data streams available. I ran a quick installation of the AWS integration with every single input + data streams enabled and here's what I see in our APM traces:

image

Total time spent installing is ~4.5s.

image

Of that, ~1.3s is spent installing ingest pipelines.

Today, we queue up an individual ES request per ingest pipeline and execute those requests concurrently, e.g.

https://github.com/elastic/kibana/blob/05e3db182c07ea3862b81767bcb95dfd8b032744/x-pack/plugins/fleet/server/services/epm/elasticsearch/ingest_pipeline/install.ts#L198-L216

It's hard to measure what difference a bulk API being available for this in ES would make without actually having the API available to benchmark, so I'm not sure we can answer that at this time. At a minimum we'd go from having to make many concurrent requests to Elasticsearch in this case to making a single request, which will be noticeable in situations where a cluster is under heavy load during package installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

7 participants