Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] Pipeline Java Execution important issues #11175

Closed
7 of 8 tasks
colinsurprenant opened this issue Sep 30, 2019 · 14 comments
Closed
7 of 8 tasks

[meta] Pipeline Java Execution important issues #11175

colinsurprenant opened this issue Sep 30, 2019 · 14 comments

Comments

@colinsurprenant
Copy link
Contributor

colinsurprenant commented Sep 30, 2019

This is a meta issue about some important Pipeline Java Execution issues.

Priority Issues

Followup Issues

Other issues

@colinsurprenant
Copy link
Contributor Author

colinsurprenant commented Dec 20, 2019

I have looked into the "Java execution startup is potentially slower for some configurations #11105" issue and as I commented in that issue, it is not potentially slower, the slowdown is in fact a factor of the number of workers, see #11105 (comment).

I will continue investigating this.

@colinsurprenant
Copy link
Contributor Author

I have created 2 PRs with alternate implementations to solve the class compilation caching problem leading to recompiling everything per worker; the first #11479 uses the existing global cache but fixes the cache key to correctly reuse compiled classes. There are some potential problems with pipeline reloading and cache invalidation with this implementation and another PR #11482 uses a different strategy by moving the class caching at the pipeline level which solves both the potential invalidation and pipeline reloading problems.

@colinsurprenant
Copy link
Contributor Author

Good news on the inputs starting before the workers are initialized: it is easier than I thought to fix, now just need to see if we make that configurable. Will push PR soon.

@colinsurprenant
Copy link
Contributor Author

#11482 was merged and should considerably improve java execution pipeline compilation time as it will not be multiplied by the number of workers anymore. Fix will be included in 7.6.0 and 7.5.2.

@colinsurprenant
Copy link
Contributor Author

PR to fix worker initialization sequence in #11492. Should we make that behaviour optional?

@colinsurprenant
Copy link
Contributor Author

#11492 is merged and will be included in 7.6.0 and 7.5.2. We decided that it was indeed a bug fix and that adding an option to eagerly start inputs before workers are fully initialized was a feature to evaluate in #11493

@roaksoax roaksoax added this to the v7.7.0 milestone Jan 21, 2020
@colinsurprenant
Copy link
Contributor Author

New solution proposal for the event ordering #11524

@colinsurprenant
Copy link
Contributor Author

#11482 addressed the multiple workers slowdown but introduced in 7.5.2 a regression for the multiple pipelines use-case. This PR will be reverted and replaced by #11564 which should correctly solve both problems.

@colinsurprenant
Copy link
Contributor Author

About

Document event ordering guaranties or lack thereof with multiple worker versus single worker

I am wondering where in the docs we should introduce/document this subject. It does not feel right IMO to simply add a snipet un the settings file. /cc @karenzone @jsvd

@karenzone
Copy link
Contributor

We're building out two general sections in Troubleshooting and Tips and Best Practices. It seems like this new info is similar in nature, even though it doesn't fit neatly under either of those current categories. If we renamed Tips and Best Practices or created a new section describing how things work, I'll bet we'd find other stuff to add.

@colinsurprenant
Copy link
Contributor Author

Yeah, I like the idea of having a more general "concepts" section? we could then reference that in the settings docs for example.

@karenzone
Copy link
Contributor

karenzone commented Feb 4, 2020

I like the idea of keeping the more general info (Troubleshooting, Tips and Best Practices, and a new concepts section) together. On the other hand, it seems like a new conceptual section makes more sense in "How Logstash Works." If I expanded that topic, I'd expect to see information such as what you want to add.

Under "How Logstash Works" (Pros and Cons):
+ Higher in book structure
+ Seems like a natural fit as part of "How Logstash Works"
- Level 2 topic nested under a Level 1 topic ("How Logstash Works"), and won't be visible unless somebody clicks to expand the Level 1 topic

A new section grouped with Troubleshooting and Tips and Best Practices (Pros and Cons):
+ Keeps general, conceptual information grouped together
+ New section could be a Level 1 heading, always visible
- The title "How Logstash Works" is already taken, so we'd need to come up with something else

WDYT?

UPDATED:
Here's a DRAFT PR for consideration/discussion: #11581

@colinsurprenant
Copy link
Contributor Author

The rebatching after filters #11710 which improves the pipeline.ordered performance has been merged and will be available starting in 7.7.0.

@colinsurprenant
Copy link
Contributor Author

Closing this, all important work was done, one remaining testing issue has been created #12476

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants