-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utilities,plugins] speedup journal collection (v2) #3879
base: main
Are you sure you want to change the base?
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
cc1c0a8
to
b4534fd
Compare
I can confirm VERY evident speedup on a tested RHEL8 system with:
where current upstream executes Some test is failing - can you @champtar review it or do you need some help? I might get to the failure later today /tomorrow. |
I'm off until the 27th so no rush for the review, and I was thinking about even implementing 'head' in Python |
needs to be updated to reflect changed command exec The
And With this PR, I see:
That suggests the |
Hi @arif-ali and @TurboTurtle , is this approach worth to try? My answer is "carefully yes". Hi @champtar , do you need some help with updating the tests? |
b4534fd
to
63c2a52
Compare
b23de87
to
04f3361
Compare
sos/utilities.py
Fixed
) | ||
else: | ||
# pylint: disable=consider-using-with | ||
_output = open(to_file, 'wb') |
Check warning
Code scanning / CodeQL
File is not always closed Warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto - why binary mode for tac=False
?
89fc059
to
2ff50e2
Compare
2ff50e2
to
5fb7009
Compare
@pmoravec cirrus-ci is broken right now, but this should be ready for review |
Thanks for the new push. CI jobs restarted, seems good. I review the code now, will test functionality later (today or tomorrow, I assume). |
Not in the PR, but I was thinking about making TailReader work with to_file at some point, as I think we don't use the 'output' that often |
So far the PR survived various my corner-case tests I invented. I shall review the code tomorrow.. |
All up to one nitpick is good. I confirm great performance gain (and no visible performance penalty for journals having 100MB where the new approach additionally has to I like the idea, but would welcome another profound review of others as well - since we touch a frequently used code. |
tac_logs() reverses the order of the logs in a text file but keeps multiline logs in order. It is aimed at reversing 'journalctl --reverse'. This avoids depending on GNU coreutils version of tac (busybox tac is missing some flags) and is way faster. Testing with 100MB of logs: - tac -brs '^[^ ]' (handle multiline): ~30s - tac_logs (our python implementation): ~0.7s - tac (not handling multiline logs): ~0.3s If we are concerned about disk usage, we could call mmap.resize() to truncate the input file while processing. Signed-off-by: Etienne Champetier <[email protected]>
We are going to introduce HeadReader. Signed-off-by: Etienne Champetier <[email protected]>
Will be used to keep the first sizelimit MB from a program stdout. Signed-off-by: Etienne Champetier <[email protected]>
Using sizelimit with to_file now keep the start of the command output instead of being ignored. You can use tac with and without sizelimit. Signed-off-by: Etienne Champetier <[email protected]>
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using tac_logs(). On journalctl timeout we now get the most recents logs where previously we were getting some random old logs. During collection, logs are now buffered on disk, so we use 2xsizelimit. Previously buffering was in RAM (also 2xsizelimit). On my test server, logs plugin runtime goes from 34s to 9.5s. Signed-off-by: Etienne Champetier <[email protected]>
5fb7009
to
060f50f
Compare
I rerun the failed test - it should succeed, I rather smell some infra issue (and logs dont show any reason). |
Thanks for the review @pmoravec ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to squash some commits, but I am ok also with this granularity.
Looking forward for either @arif-ali or @TurboTurtle review.
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using our own implementation of tac.
To handle multiline logs we would need to use "tac -brs '^[^ ]'" that takes ~30s on 100M of logs when plain 'tac' takes ~0.3s. Our simple implementation in python takes 0.7s, and avoid an extra dependency.
On journalctl timeout we now get the most recents logs.
During collection logs are now buffered on disk, so we use 2xsizelimit. While running our tac we could actually truncate the source file to limit disk usage. Previously buffering was in RAM (also 2xsizelimit).
On my test server, logs plugin runtime goes from 34s to 9.5s.
Fixes #3615
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines