-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status service logs huge messages, blowing up filebeat indexing #100202
Comments
Pinging @elastic/kibana-core (Team:Core) |
Yea, our I'm wondering if we should just reduce the granularity in the log messages though. It may cause issues when indexed into filebeat, but it could be valuable information from a debug point of view. WDYT @joshdover? |
Maybe we shouldn't spam logs, but provide extended information via an HTTP endpoint? In this case, this information is available on demand. |
the kibana/src/core/server/status/status_service.ts Lines 77 to 81 in 48523e5
|
one thing to keep in mind here would be that kibana very well might be down when this data is needed. If that's the case it would be nice to have this as a log message (but trimmed down) if the endpoint is unreachable. |
I ran into this recently looking at a logfile with "[36] services are unavailable", spammed repeatedly through the logs. Every single message was 3.4 MB 😱 the log file capture was over an 8 minute timeframe and it totaled 316.4 MB. That's absolutely insane and it crashed VS Code multiple times while I was inspecting it. When I pretty print and format this single log line with 2 space indents, it blows up to 262,501 lines..! Example attached below (compact and pretty printed) sample-services-unavailable.log.zip |
I think the issue with the "recursive" large task manager status objects was resolved in this PR - #98265 . What version are you seeing this in; it should have been fixed in 7.13.0, and I think may have only been a problem during 7.12.x. |
The support case I was helping with was running 7.12.0, thanks for linking this here! |
Though this was fixed for the Task Manager case, I went ahead and removed this recursive meta field which really isn't useful and actually leaks some status information across plugin dependency boundaries in some cases. This should help prevent this from happening again in the future: #106286 |
When logLevel is set to
debug
, the status service will write out a log message with the summary status serialised. However, this object can be huge, possibly due to how Task Manager reconstructs its own service status.When writing these logs files to disk, and indexing it with filebeat, with
json.keys_under_root
enabled, a mapping explosion will occur. From a local run, 859 fields were added underkibana.status
before running into a field limit of 10k.This also causes performance issues in Discover - I've seen it crash when I was on a support call.
Log message example
The text was updated successfully, but these errors were encountered: