-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect consecutive timeouts without events and alert accordingly to a configurable value #1622
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…uts without an event is greater than a given threshold The rationale is that in case Falco obtains a consistent number of consecutive timeouts (in a row) without a valid event, something is going wrong. This because, normally, the libs send timeouts to Falco (also) to signal events to discard. In such cases, which are the majority of cases, `ev` exists and is not `null`. Signed-off-by: Leonardo Di Donato <[email protected]>
Signed-off-by: Leonardo Di Donato <[email protected]>
Co-authored-by: Lorenzo Fontana <[email protected]> Signed-off-by: Leonardo Di Donato <[email protected]>
Falco uses a shared buffer between the kernel and userspace to receive the events (eg., system call information) in userspace. Anyways, the underlying libraries can also timeout for various reasons. For example, there could have been issues while reading an event. Or the particular event needs to be skipped. Normally, it's very unlikely that Falco does not receive events consecutively. Falco is able to detect such uncommon situation. Here you can configure the maximum number of consecutive timeouts without an event after which you want Falco to alert. By default this value is set to 1000 consecutive timeouts without an event at all. Signed-off-by: Leonardo Di Donato <[email protected]>
…ication gets emitted Also, print out the time of the last processed event in the output fields of the notification. Signed-off-by: Leonardo Di Donato <[email protected]>
…usly processed event Signed-off-by: Leonardo Di Donato <[email protected]>
leodido
force-pushed
the
feature/detect-consecutive-timeouts
branch
from
April 16, 2021 10:40
71b2ba0
to
7bcd4a3
Compare
/milestone 0.28.1 |
/cc @fntlnz |
leogr
approved these changes
Apr 19, 2021
LGTM label has been added. Git tree hash: c4d81e7bfbf8b63ea3dd5ee7986a910b15138a84
|
fntlnz
approved these changes
Apr 19, 2021
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fntlnz, leogr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
NONE
What this PR does / why we need it:
This PR makes Falco able to detect a very uncommon situation and alert the user about it.
As everyone probably already knows today, Falco receives events from the drivers through the libraries.
Not all the events that the libraries emit are of interest to Falco.
For this reason and other more complex reasons (eg., timeout while reading the event from the ring buffer),
Falco receives timeouts (
SCAP_TIMEOUTS
).In the majority of cases, when Falco receives a timeout it also receives an event that Falco discards.
But, if Falco receives too many consecutive timeouts without the events it is likely that something is going wrong at the lower level.
These code changes let the user configure how to detect such an unlikely situation and alert.
Through the
syscall_event_timeouts.max_consecutive
config field the user can instruct Falco after how many consecutive timeouts without an event to emit an alert (with DEBUG priority).Other than the message, the alert will contain the current time and the time of the last processed (
SCAP_SUCCESS
) event (if available, otherwise "none").Which issue(s) this PR fixes:
NONE
Special notes for your reviewer:
On my CPU a default value of 1000 for
max_consecutives
config value maps to a frequency in the range of 30-40 seconds (depending on the system load too).In this scenario, Falco alerts if for 30-40 seconds it is not processing events.
Wondering if we're good with such value or if we may need to double it.
Reproduce
The only simple way to reproduce such an unlikely situation and test out this PR is the following one.
sudo ./build/userspace/falco/falco -r rules/falco_rules.yaml -u
sudo ./userspace-example // don't pay attention to the sudo for now, it's an example tool
renameat
event10:00:44.036991000: Warning Shell history had been deleted or renamed (user=<NA> user_loginuid=-1 type=renameat command=<NA> fd.name=<NA> name=<NA> path=<NA> oldpath=/tmp/bash_history host (id=host))
Does this PR introduce a user-facing change?: