Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auditd: Give up writing error if channel full. #87

Merged
merged 1 commit into from
Jun 28, 2023

Conversation

stephen-fox
Copy link
Contributor

@stephen-fox stephen-fox commented Jun 2, 2023

Prior to this change, I noticed TestAuditd_Read_AuditEventWriterError would occasionally fail with the error: "panic: test timed out after 10m0s". This can be seen in the linked GitHub CI run. [1]

I was able to reproduce this failure by re-running the test several times locally. [2] I am not entirely sure what the failure is caused by, as adding logging caused the issue to stop occurring). Looking at the CI log, it seems like the ReassemblyComplete method was called by two different Go routines. The Go routines became blocked trying to write to the errors channel. [1]

I noticed that the reassemblerCB callback does not bound the errors channel write with any other event. As a result, a blocked write will cause a deadlock.

There are at least two ways to deal with this:

  1. Bound the errors channel write with a read that indicates when the callback should stop doing work
  2. Give up writing to the channel if there is no capacity in the channel (this would be acceptable simply because we set a capacity of one error for the errors channel, thus one error will always be successfully written)

I opted for the second option because it is the simplest.

References:

  1. https://github.com/metal-toolbox/audito-maldito/actions/runs/5158110846/jobs/9291366073
while /bin/sh -c 'timeout 4 go test -run TestAuditd_Read_AuditEventWriterError || (echo error: timed-out && return 1)'; do :; done
PASS ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
PASS ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
PASS ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
error: timed-out

Prior to this change, I noticed TestAuditd_Read_AuditEventWriterError
would occasionally fail with the error: "panic: test timed out after
10m0s". This can be seen in the linked GitHub CI run. [1]

I was able to reproduce this failure by re-running the test several
times locally. [2] I am not entirely sure what the failure is caused
by, as adding logging caused the issue to stop occurring). Looking at
the CI log, it seems like the ReassemblyComplete method was called
by two different Go routines. The Go routines became blocked trying
to write to the errors channel. [1]

I noticed that the reassemblerCB callback does not bound the errors
channel write with any other event. As a result, a blocked write
will cause a deadlock.

There are at least two ways to deal with this:

1. Bound the errors channel write with a read that indicates
   when the callback should stop doing work
2. Give up writing to the channel if there is no capacity in
   the channel (this would be acceptable simply because we
   set a capacity of one error for the errors channel, thus
   one error will always be successfully written)

I opted for the second option because it is the simplest.

References:

1. https://github.com/metal-toolbox/audito-maldito/actions/runs/5158110846/jobs/9291366073
2. while /bin/sh -c 'timeout 4 go test \
  -run TestAuditd_Read_AuditEventWriterError \
  || (echo error: timed-out && return 1)'; do :; done
  PASS
  ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
  PASS
  ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
  PASS
  ok  	github.com/metal-toolbox/audito-maldito/processors/auditd	0.012s
  error: timed-out
@codecov
Copy link

codecov bot commented Jun 2, 2023

Codecov Report

Merging #87 (8a7ee7b) into main (bdff86d) will decrease coverage by 0.24%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
- Coverage   67.83%   67.60%   -0.24%     
==========================================
  Files          30       30              
  Lines        1968     1963       -5     
==========================================
- Hits         1335     1327       -8     
- Misses        566      569       +3     
  Partials       67       67              
Flag Coverage Δ
unittests 67.60% <66.66%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
processors/auditd/reassembler_callback.go 88.88% <66.66%> (-11.12%) ⬇️

... and 1 file with indirect coverage changes

@stephen-fox stephen-fox force-pushed the reassembler-cb-check-cancel branch 2 times, most recently from 20d264d to 8a7ee7b Compare June 2, 2023 21:05
@stephen-fox stephen-fox changed the title auditd: Check context done in reassemblerCB. auditd: Give up writing error if channel full. Jun 2, 2023
@stephen-fox stephen-fox marked this pull request as ready for review June 28, 2023 18:24
@stephen-fox stephen-fox requested review from a team as code owners June 28, 2023 18:24
@stephen-fox stephen-fox merged commit d5cd997 into main Jun 28, 2023
@stephen-fox stephen-fox deleted the reassembler-cb-check-cancel branch June 28, 2023 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants