Cherry-pick #22673 to 7.x: [Auditbeat] Recover from errors in audit monitoring routine #22724
+15
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick of PR #22673 to 7.x branch. Original message:
The auditd module spawns a monitoring goroutine that fetches auditd status every 15s. Due to this routine using a single audit client, if an update fails (because a netlink message is late or other causes), the audit client can get out of sync with the stream, failing in all subsequent requests.
For reasons that aren't 100% clear to me at the moment, this error condition leads to a lot of
[audit_send_repl]
(2.6.x) /[audit_send_reply]
(3.x+) kernel threads being created. (Reproduced in 2.6.32, no other versions tested).The following error will appear every 15s:
ps -ef
will show a lot ofaudit_send_repl
threads:This patch updates the error-handling logic to create a new audit client when a status update fails, allowing to recover and preventing the proliferation of
audit_send_repl
kernel threads.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
It's easy to reproduce this issue by modifying the code at
beats/auditbeat/module/auditd/audit_linux.go
Lines 159 to 182 in bb973c4
client.GetStatusAsync(false)
outside of the polling loop.Similar can be used to validate this fix. Ideally sending an async getstatus every few iterations of the loop.