Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add signal/wait model for TestAuditlogImpl #1758

Merged
merged 4 commits into from
Apr 13, 2022

Conversation

peternied
Copy link
Member

Description

I have been tracking test failures with testRestMethod very often
showing failures. My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method doThenWaitForMessages(...) this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved. There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention. This
can be migrated in one-offs or all at once if we see more troublesome
behavoir. The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavoir.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
@peternied peternied added bug Something isn't working test fix test fixes labels Apr 11, 2022
@peternied peternied requested a review from a team April 11, 2022 22:55
@codecov-commenter
Copy link

codecov-commenter commented Apr 12, 2022

Codecov Report

Merging #1758 (affcfa7) into main (74618eb) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #1758      +/-   ##
============================================
+ Coverage     60.42%   60.44%   +0.01%     
- Complexity     3196     3199       +3     
============================================
  Files           253      253              
  Lines         18093    18095       +2     
  Branches       3245     3245              
============================================
+ Hits          10933    10937       +4     
+ Misses         5579     5578       -1     
+ Partials       1581     1580       -1     
Impacted Files Coverage Δ
...pensearch/security/auditlog/impl/AuditMessage.java 73.46% <100.00%> (+0.27%) ⬆️
...a/org/opensearch/security/tools/SecurityAdmin.java 37.74% <0.00%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74618eb...affcfa7. Read the comment docs.

davidlago
davidlago previously approved these changes Apr 12, 2022
Copy link

@davidlago davidlago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this! Let me know if this proves/disproves your hypothesis after the change merges and it runs for a few days :)

@peternied
Copy link
Member Author

Thank you for adding this! Let me know if this proves/disproves your hypothesis after the change merges and it runs for a few days :)

I've run this in loops on my dev machine, and its been solid as a rock so far, it will be interesting to see how it looks inside of GitHub actions.

Signed-off-by: Peter Nied <[email protected]>
@cliu123
Copy link
Member

cliu123 commented Apr 13, 2022

There're whole bunch of Assert.assertTrue(TestAuditlogImpl.sb.toString().contains(...) in the codebase. Now that a new method is being introduced, can we migrate all of them with the new method? Using the same method for exactly same purpose in all places would retain consistency in the codebase.

return true;
}

/** Unneeded after switching to `doThenWaitForMessages(...)` as data is automatically flushed */
@Deprecated
Copy link
Member

@cliu123 cliu123 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for keeping 2 different methods for the exactly same purpose? If not, replacing the old method with the new method would retain consistency.(Same question as the one above.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testRestMethod has been the source of trouble, lets incrementally invest in the space.

If none of these other tests need to be updated, it will save us time/trouble. Each assert that was updated required investigation to confirm the correct behavior. With the api deprecation, this will help ensure new tests use the more reliable technique.

@peternied
Copy link
Member Author

There're whole bunch of Assert.assertTrue(TestAuditlogImpl.sb.toString().contains(...) in the codebase. Now that a new method is being introduced, can we migrate all of them with the new method? Using the same method for exactly same purpose in all places would retain consistency in the codebase.

There are 2,375 instances of this pattern in the code base, I've created Reduce the amount of assertTrue/False statements so we don't rewrite the test infrastructure with this change.

@peternied peternied merged commit e213e9c into opensearch-project:main Apr 13, 2022
@peternied peternied deleted the TestAuditlogImpl branch April 13, 2022 19:12
@peternied peternied added backport 1.x backport to 1.x branch backport 1.3 backport to 1.3 branch labels Apr 26, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.x 1.x
# Navigate to the new working tree
cd .worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-1758-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e213e9c23124418f6f3f2b4a35568ab89ae25232
# Push it to GitHub
git push --set-upstream origin backport/backport-1758-to-1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.x

Then, create a pull request where the base branch is 1.x and the compare/head branch is backport/backport-1758-to-1.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.3 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.3 1.3
# Navigate to the new working tree
cd .worktrees/backport-1.3
# Create a new branch
git switch --create backport/backport-1758-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e213e9c23124418f6f3f2b4a35568ab89ae25232
# Push it to GitHub
git push --set-upstream origin backport/backport-1758-to-1.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.3

Then, create a pull request where the base branch is 1.3 and the compare/head branch is backport/backport-1758-to-1.3.

peternied added a commit that referenced this pull request Apr 26, 2022
* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavoir.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
(cherry picked from commit e213e9c)
peternied added a commit that referenced this pull request Apr 26, 2022
* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavoir.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
(cherry picked from commit e213e9c)
peternied added a commit that referenced this pull request Apr 26, 2022
* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavoir.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
(cherry picked from commit e213e9c)
@peternied peternied removed the backport 1.x backport to 1.x branch label Apr 27, 2022
@peternied
Copy link
Member Author

Note; we aren't likely to make a 1.4 so I'm not backporting this change to that branch at this time.

peternied added a commit to peternied/security that referenced this pull request Dec 2, 2022
* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were received.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are received the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavior.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
peternied added a commit that referenced this pull request Dec 6, 2022
Windows build and test support for 1.3

- Use most recent format of CI workflows from main
- Backport #2206
- Supply workarounds for JDK8 incompatible APIs for Encoding / Pattern matching (Thanks @cwperks!)
- Backport only freeport logic from #1638
- Backport #1758
- All updates to TestAuditlogImpl.java from main
  - #2180
  - #1935 
  - #1920
  - #1914
  - #1829 
  - And Targeted test updates for ComplianceAuditlogTest and BasicAuditlogTest to keep up with TestAuditlogImpl.java updates

Signed-off-by: Peter Nied <[email protected]>
Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: Stephen Crawford <[email protected]>
Co-authored-by: Stephen Crawford <[email protected]>
wuychn pushed a commit to ochprince/security that referenced this pull request Mar 16, 2023
* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were recieved.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are recieved the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavoir.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.3 backport to 1.3 branch bug Something isn't working test fix test fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants