Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: asynchronous outputs and slow outputs detection #1451

Merged
merged 19 commits into from
Dec 1, 2020

Conversation

leogr
Copy link
Member

@leogr leogr commented Oct 19, 2020

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area engine

What this PR does / why we need it:

This PR aims to introduce a non-blocking Falco's outputs processing, as per #1417. In details:

  • f449316 simplified (again) the outputs interface by using a single struct as param and melting together the two output methods (i.e., output_event() and output_msg());
  • d59d2a4 introduced a bounded concurrent message queue (used for control commands too) that sits between:
    • the handlers for incoming events/messages (i.e., the producers) that run in the main thread, and
    • the worker (i.e., the consumer) that runs in another thread

Incoming messages are being formatted in the main thread, then send to the worker thru the queue. Finally, the worker receives the formatted messages, then fanouts them to the enabled outputs.

Since the new implementation forwards both the usual messages (i.e., those that happen when a rule is matched) and other kinds of messages (i.e., drop alerts) to the same method (falco::outputs::abstract_output::output(const message *msg)), now the gRPC output inherits the ability to subscribe to dropped events.

Finally, d6f2a3a and 4e520fa introduced a watchdog for slow output channels. Basically, when the consumer blocks an output channel for a while, an error is logged (that signals a misconfiguration or a problem that the user should fix).

Which issue(s) this PR fixes:

Fixes #531
Fixes #1417
Fixes #884 (drop alerts for gRPC)

Also, reduce the likelihood of dropped events, see #1403.

Special notes for your reviewer:

  1. There is no guard to avoid that the message queue grows indefinitely. However, the message queue growth is limited by the embedded notification rate limiter. There is also no guard for the gRPC queue, but it's a different issue (in this case depends on the ability/presence of a client to consume notifications). It's not a goal of this PR solving these issues.

  2. I'm assuming 2 seconds is a sane default to detect slow output channel consumers. You can simulate a slow output by configuring the program_output as following:

program_output:
  enabled: false
  keep_alive: false
  program: "sleep 5"
  1. Regarding gRPC.
    AFAIK one blocker for adding drop alerts was the decision about using the existing API or a newer one (see gRPC outputs: stream drop alerts #884).
    Since even with drop alert we will not have any empty field, I believe it's acceptable to re-use the same proto also in case of a drop alert message.

However, I have added a new source: internal. It's needed because Falco internal messages (e.g., drop alerts) did not have their own source, and the proto requires that (we cannot just leave it empty).

Below an example of how a drop alert will look like to a gRPC client:

time: <                                                                                                          
  seconds:1603121860 nanos:517807808                                                                             
>                                                                                                                
priority: CRITICAL                                                                                               
source: INTERNAL                                                                                                 
rule: "Falco internal: syscall event drop"                                                                       
output: "17:37:40.517807808: Critical Falco internal: syscall event drop. 1167 system calls dropped in last secon
d. (ebpf_enabled=0 n_drops=1167 n_drops_buffer=1167 n_drops_bug=0 n_drops_pf=0 n_evts=8549)"                     
output_fields: <                                                                                                 
  key: "ebpf_enabled"                                                                                            
  value: "0"                                                                                                     
>                                                                                                                
output_fields: <                                                                                                 
  key: "n_drops"                                                                                                 
  value: "1167"                                                                                                  
>                                                                                                                
output_fields: <                                                                                                 
  key: "n_drops_buffer"                                                                                          
  value: "1167"                                                                                                  
>                                                                                                                
output_fields: <                                                                                                 
  key: "n_drops_bug"                                                                                             
  value: "0"                                                                                                     
>                                                                                                                
output_fields: <                                                                                                 
  key: "n_drops_pf"                                                                                              
  value: "0"                                                                                                     
>                                                                                                                
output_fields: <                                                                                                 
  key: "n_evts"                                                                                                  
  value: "8549"                                                                                                  
>                                                                                                                
hostname: "x1"

Does this PR introduce a user-facing change?:

new: asynchronous outputs implementation, outputs channels will not block event processing anymore
new: slow outputs detection
new: `output_timeout` config option for slow outputs detection
update: gRPC clients can now subscribe to drop alerts via gRCP API

@leogr leogr force-pushed the feat/non-blocking-outputs branch from afcbd9d to 0478ec9 Compare October 28, 2020 10:14
@leogr leogr changed the title wip: feat: non-blocking outputs feat: non-blocking outputs Nov 6, 2020
@leogr
Copy link
Member Author

leogr commented Nov 6, 2020

I believe it's ready for review now.

/cc @fntlnz
/cc @leodido

@poiana poiana requested a review from leodido November 6, 2020 16:57
@leogr leogr changed the title feat: non-blocking outputs feat: non-blocking outputs and slow outputs detection Nov 6, 2020
@poiana
Copy link
Contributor

poiana commented Nov 30, 2020

LGTM label has been added.

Git tree hash: ac6f76b0f8c7db45259b70b4dddcf70c056bf372

@poiana
Copy link
Contributor

poiana commented Nov 30, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fntlnz, leodido

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@leodido
Copy link
Member

leodido commented Dec 1, 2020

Since the release of Falco 0.27 has been postponed I let this go in! 🥳

/hold cancel

@leodido
Copy link
Member

leodido commented Dec 1, 2020

/milestone 0.27.0

@poiana poiana modified the milestones: 0.28.0, 0.27.0 Dec 1, 2020
@poiana poiana merged commit 6a35233 into master Dec 1, 2020
@poiana poiana deleted the feat/non-blocking-outputs branch December 1, 2020 09:18
leogr added a commit to falcosecurity/charts that referenced this pull request Jan 18, 2021
leogr added a commit to falcosecurity/charts that referenced this pull request Jan 19, 2021
leogr added a commit to falcosecurity/charts that referenced this pull request Jan 19, 2021
poiana pushed a commit to falcosecurity/charts that referenced this pull request Jan 19, 2021
mstemm added a commit that referenced this pull request Aug 26, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Oct 4, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Oct 4, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Oct 5, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Oct 5, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Oct 5, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
poiana pushed a commit that referenced this pull request Oct 12, 2021
Modify falco_formats to only be responsible for resolving a rule's
output string or coming up with a map of field name->field values from
a given output string.

It relies on the changes in
falcosecurity/libs#77 to use generic
formatters for a given source.

Remove lua bindings to create a formatter/free a formatter. Those were
unused as of the changes in
#1451, so finally remove
them now.

Signed-off-by: Mark Stemm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants