Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: Support multiple pipelines from _simulate without creating actual pipelines #35495

Closed
jakelandis opened this issue Nov 13, 2018 · 5 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@jakelandis
Copy link
Contributor

In 6.5 the ingest node introduced the pipeline processor which allows one pipeline to call another pipeline. In order to call another pipeline, you need a named pipeline, a.k.a a real pipeline defined.

The _simulate API only supports a single nameless pipeline, which makes simulations against pipeline to pipeline calls awkward. For example, to use the ingest _simluate API with multiple pipelines, you must create real pipelines for the simulation. For example:

PUT _ingest/pipeline/pipeline1
{
  "processors": [
    {
      "set": {
        "field": "1",
        "value": "b"
      }
    },
    {
      "pipeline": {
        "name": "pipeline2"
      }
    }
  ]
}

PUT _ingest/pipeline/pipeline2
{
  "processors": [
        {
      "set": {
        "field": "2",
        "value": "c"
      }
    },
    {
      "pipeline": {
        "name": "pipeline3"
      }
    }
  ]
}

PUT _ingest/pipeline/pipeline3
{
  "processors": [
    {
      "set": {
        "field": "3",
        "value": "d"
      }
    }
  ]
}

POST _ingest/pipeline/pipeline1/_simulate?verbose
{
  "docs": [
    {
      "_source": {
        "0" : "a"
      }
    }
  ]
}

The above simulation works without issue, however:

  • It is not well documented
  • It requires creating real pipelines
    • It can pollute cluster state with in-process process pipelines
    • Others can execute the pipelines outside the scope of simulate
    • It is too easy to forget to delete the in-process pipelines
    • Any future tooling to help build pipelines would not want to have to create real pipelines just for simulation.

The proposal here is to create a new "pipelines" top level array for the _simulate API which can be used instead of the "pipeline" top level object.

@jakelandis jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Nov 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@ycombinator
Copy link
Contributor

ycombinator commented Nov 21, 2018

Hey @jakelandis thanks for creating this issue. We might have a need for this feature in elastic/beats#8914. Without mentioning specific releases, any ideas if this is something that'll be worked on "soon" or "later" 😄? Thanks.

@jakelandis
Copy link
Contributor Author

"soon" or "later"

@ycombinator later. I a took a quick look and while not terribly difficult, it is not a trivial change either.

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dakrone
Copy link
Member

dakrone commented May 17, 2024

This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
@ruflin
Copy link
Contributor

ruflin commented May 21, 2024

Mentioning the new end to end simulate API here that was built recently: #101409 One of the problems this solves is to test multiple pipelines that already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

6 participants