Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unit test flaky for IPsec controller #4016

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

xliuxu
Copy link
Contributor

@xliuxu xliuxu commented Jul 15, 2022

Wait one second as the fake clientset doesn't support watching with specific resourceVersion.
Otherwise the update event would be missed by the watcher used in csrutil.WaitForCertificate()
if it happens to be generated in-between the List and Watch calls.

Fixes: #3851

Signed-off-by: Xu Liu [email protected]

@xliuxu xliuxu added the action/backport Indicates a PR that requires backports. label Jul 15, 2022
@xliuxu xliuxu added this to the Antrea v1.8 release milestone Jul 15, 2022
@codecov
Copy link

codecov bot commented Jul 15, 2022

Codecov Report

Merging #4016 (5251bd0) into main (e5a98dc) will increase coverage by 6.24%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4016      +/-   ##
==========================================
+ Coverage   61.51%   67.75%   +6.24%     
==========================================
  Files         294      297       +3     
  Lines       43726    43820      +94     
==========================================
+ Hits        26897    29691    +2794     
+ Misses      14578    11806    -2772     
- Partials     2251     2323      +72     
Flag Coverage Δ
integration-tests 35.97% <ø> (?)
kind-e2e-tests 51.32% <ø> (+6.78%) ⬆️
unit-tests 44.21% <ø> (-0.05%) ⬇️
Impacted Files Coverage Δ
pkg/apiserver/handlers/endpoint/handler.go 56.52% <0.00%> (-13.05%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 76.41% <0.00%> (-2.63%) ⬇️
pkg/controller/externalippool/controller.go 83.03% <0.00%> (-1.79%) ⬇️
...gent/controller/noderoute/node_route_controller.go 56.12% <0.00%> (-1.47%) ⬇️
pkg/ovs/openflow/ofctrl_packetout.go 79.71% <0.00%> (-0.87%) ⬇️
pkg/agent/cniserver/ipam/testing/utils.go 92.85% <0.00%> (ø)
...icluster/controllers/multicluster/common/helper.go 58.00% <0.00%> (ø)
pkg/agent/cniserver/testing/utils.go 100.00% <0.00%> (ø)
...ntroller/networkpolicy/networkpolicy_controller.go 82.32% <0.00%> (+0.31%) ⬆️
pkg/agent/controller/networkpolicy/reconciler.go 68.90% <0.00%> (+0.48%) ⬆️
... and 75 more

}
}
}()
go newFakeSigner(t,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fake watcher registers their watches synchronously, so it should be impossible that a watcher missed an event of an object created after the watch started.

I suspect the issue is that WaitForCertificate missed the update event triggered by the signer because it uses a ListWatch, the fake clientset doesn't support watching from a specific resource version, so a fake ListWatch actually sends two individual requests, a List request and a subsequent Watch request. If the signer happens to sign the CSR in between the two requests, WaitForCertificate would never receive the update event.

This can be confirmed by adding a log to the signer to check when the failure happens, does the signer signs a CSR.

My previous comment about the defect of fake clientset:

// Must wait for cache sync, otherwise resource creation events will be missing if the resources are created
// in-between list and watch call of an informer. This is because fake clientset doesn't support watching with
// resourceVersion. A watcher of fake clientset only gets events that happen after the watcher is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right and I misunderstood the goroutine in the Watch API. I think the issue did exist in WaitForCertificate when using the fake clientset. But I did not come up with a simple and clean solution for this issue. How about just waiting for one second before updating the CSR in the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a better solution, I'm fine with fixing it in this way first. If it still fails in the future, perhaps we could updating the CSR multiple times to make sure the watcher will receive updates.

@xliuxu xliuxu force-pushed the fix_ipsec_controller_ut_flaky branch 2 times, most recently from 9cb00ed to 4e7f1e9 Compare July 21, 2022 23:16
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description needs update

Wait one second as the fake clientset doesn't support watching with specific resourceVersion.
Otherwise the update event would be missed by the watcher used in csrutil.WaitForCertificate()
if it happens to be generated in-between the List and Watch calls.

Fixes: antrea-io#3851

Signed-off-by: Xu Liu <[email protected]>
@xliuxu xliuxu force-pushed the fix_ipsec_controller_ut_flaky branch from 4e7f1e9 to 5251bd0 Compare July 27, 2022 03:37
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 27, 2022

/skip-all

@tnqn tnqn merged commit 684dca3 into antrea-io:main Jul 27, 2022
@xliuxu xliuxu deleted the fix_ipsec_controller_ut_flaky branch July 27, 2022 05:29
hjiajing pushed a commit to hjiajing/antrea that referenced this pull request Jul 28, 2022
Wait one second as the fake clientset doesn't support watching with specific resourceVersion.
Otherwise the update event would be missed by the watcher used in csrutil.WaitForCertificate()
if it happens to be generated in-between the List and Watch calls.

Fixes: antrea-io#3851

Signed-off-by: Xu Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit test for IPsec certificate is flaky
2 participants