[8.x](backport #42398) Handle leak of process info in hostfs
provider for add_session_metadata
#42792
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Fixes #42317
So, it turns out that the processsDB used by the procfs provider in
add_session_metadata
expects events to come in order, which won't always be the case under load. If we get a an exit event before the exec event, we'll drop the exit event, and then the process event will remain in thedb.processes
map indefinitely. In addition to this, auditbeat is configured to tell netlink to drop events, meaning that under load, we can lose either the exec or the exit event, potentially leading to a leak if we can never pair up the two for a given process.This alters the DB so we don't drop orphaned exit events, and instead the DB reaper will wait a few iterations of
reapProcs()
to try to match the orphaned exit. We also optionally reap processexec
events. I've tested this under load, and it does prevent the process DB from growing indefinitely.There's a few caveats to this as-is:
db.removalMap
, which means we'll be using more memory until those exit events are reaped. I can't really think of a good way around this./proc
.There's also a few smaller changes to the process DB:
I'm still running performance tests on this, as the behavior is a bit bursty and hard to measure without some proper scripts. Will update when I have results.
How to test
Run auditbeat with the following:
Grep for the
REAPER:
log line to examine the following the state of the various DB maps.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Data: ProcessDB under load
Data: Memory during and after load
This is an automatic backport of pull request #42398 done by [Mergify](https://mergify.com).