Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky integration test case "TestIPAssigner" #6890

Closed
wenyingd opened this issue Dec 30, 2024 · 1 comment · Fixed by #6898
Closed

Flaky integration test case "TestIPAssigner" #6890

wenyingd opened this issue Dec 30, 2024 · 1 comment · Fixed by #6898
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@wenyingd
Copy link
Contributor

Describe the bug

A flaky integration test is observed with the below output,

=== RUN   TestIPAssigner
I1230 09:37:38.897460   13109 ip_assigner_linux.go:487] "Creating VLAN sub-interface" interface="antrea-ext.20" parent="eth0" vlan=20
I1230 09:37:38.898176   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="10.10.20.11" interface="antrea-ext.20"
I1230 09:37:38.905726   13109 ip_assigner_linux.go:487] "Creating VLAN sub-interface" interface="antrea-ext.30" parent="eth0" vlan=30
I1230 09:37:38.906408   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="10.10.30.10" interface="antrea-ext.30"
I1230 09:37:38.908090   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="10.10.10.10" interface="antrea-dummy0"
I1230 09:37:38.925500   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="10.10.10.11" interface="antrea-dummy0"
I1230 09:37:38.933412   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="2021:124:6020:1006:250:56ff:fea7:36c2" interface="antrea-dummy0"
I1230 09:37:38.935295   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="10.10.20.10" interface="antrea-ext.20"
I1230 09:37:38.965013   13109 ip_assigner_linux.go:100] "Assigned IP to interface" ip="2021:124:6020:1006:250:56ff:fea7:36c4" interface="antrea-dummy0"
I1230 09:37:38.965141   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="2021:124:6020:1006:250:56ff:fea7:36c2" interface="antrea-dummy0"
I1230 09:37:38.965228   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="10.10.20.11" interface="antrea-ext.20"
I1230 09:37:38.965306   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="10.10.30.10" interface="antrea-ext.30"
I1230 09:37:38.965322   13109 ip_assigner_linux.go:406] "Deleting VLAN sub-interface" interface="antrea-ext.30" vlan=30
    ip_assigner_linux_test.go:101: 
        	Error Trace:	/usr/src/antrea.io/antrea/test/integration/agent/ip_assigner_linux_test.go:101
        	Error:      	Not equal: 
        	            	expected: sets.Set[string]{"10.10.20.10/24":sets.Empty{}}
        	            	actual  : sets.Set[string]{}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,4 +1,2 @@
        	            	-(sets.Set[string]) (len=1) {
        	            	- (string) (len=14) "10.10.20.10/24": (sets.Empty) {
        	            	- }
        	            	+(sets.Set[string]) {
        	            	 }
        	Test:       	TestIPAssigner
        	Messages:   	Actual IPs don't match
I1230 09:37:38.975390   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="10.10.10.11" interface="antrea-dummy0"
I1230 09:37:38.975464   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="2021:124:6020:1006:250:56ff:fea7:36c4" interface="antrea-dummy0"
I1230 09:37:38.975543   13109 ip_assigner_linux.go:142] "IP does not exist on interface" ip="10.10.20.10" interface="antrea-ext.20"
I1230 09:37:38.975557   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="10.10.20.10" interface="antrea-ext.20"
I1230 09:37:38.975566   13109 ip_assigner_linux.go:406] "Deleting VLAN sub-interface" interface="antrea-ext.20" vlan=20
I1230 09:37:38.987030   13109 ip_assigner_linux.go:145] "Deleted IP from interface" ip="10.10.10.10" interface="antrea-dummy0"
--- FAIL: TestIPAssigner (0.13s)

links: https://github.com/antrea-io/antrea/actions/runs/12544675222/job/34977664966?pr=6889

To Reproduce

Expected

Actual behavior

Versions:

Additional context

@wenyingd wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Dec 30, 2024
@antoninbas antoninbas added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jan 3, 2025
@antoninbas antoninbas self-assigned this Jan 3, 2025
@antoninbas
Copy link
Contributor

After investigation, this is actually not an issue with the integration test, but an issue with the IPAssigner.

It turns out that the test is always failing when 10.10.20.11 is assigned before 10.10.20.10 (in go the iteration order over a map is randomized). When that happens, 10.10.20.11/24 becomes the primary IP, while 10.10.20.10/24 is the secondary IP (they are in the same subnet). When the primary address is deleted, Linux will automatically remove all secondary IP addresses in the subnet, unless the promote_secondaries sysctl variable is set to 1 for the interface. It seems that by default, the value of promote_secondaries is not always the same depending on the distribution or Linux kernel version. On my Linux VM, promote_secondaries defaults to 1, which is why I couldn't reproduce the test failure initially, no matter how many times I ran the test. When I run the test inside a Docker container using the Colima VM on my mac, promote_secondaries defaults to 0 and I can reproduce the issue consistently.

The best thing to do in Antrea is to ensure that promote_secondaries is set to 1 whenever we create an interface in the IPAssigner. I am submitting a patch to do that.

antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 3, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 3, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this issue Jan 6, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this issue Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this issue Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this issue Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants