-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[calico] don't enable ipip encapsulation by default and use vxlan in CI #8434
[calico] don't enable ipip encapsulation by default and use vxlan in CI #8434
Conversation
At my job we switched to vxlan crosssubnet by default a long time ago because IPIP offload is broken with many driver / firmware combination. I think RHEL started to enable IPIP offload by default (if supported) in 8.3 or 8.4 and I found at least 3 different broken cases. Nobody uses IPIP except Calico I think, so it's not tested, when vxlan is used by OpenStack and many cloud. Also CrossSubnet because why encapsulate when not needed. |
Thanks for the extra information @champtar ! it seems to put one more nail on the |
I opened an issue on calico project side to track this: projectcalico/calico#5449 |
The eBPF test case is actually manual so I wrote the above conclusion too soon. Keeping this back in draft. |
d1f4444
to
23ddc07
Compare
23ddc07
to
21c57c2
Compare
00f8e3a
to
6a2a48f
Compare
…issues during netchecker check * service proxy mode still fails connectivity tests so keeping it manual mode
1d4a1c1
to
839e174
Compare
Sorry for late response and thanks for updating. /lgtm |
The reason IPIP was not working in Calico was due to an internal change to IPIP code in kernel. We identified the change, and fixed our code to respect the new behaviour. The fix (projectcalico/calico#5846) will be available in Calico v3.23.0. |
…CI (kubernetes-sigs#8434) * [calico] make vxlan encapsulation the default * don't enable ipip encapsulation by default * set calico_network_backend by default to vxlan * update sample inventory and documentation * [CI] pin default calico parameters for upgrade tests to ensure proper upgrade * [CI] improve netchecker connectivity testing * [CI] show logs for tests * [calico] tweak task name * [CI] Don't run the provisioner from vagrant since we run it in testcases_run.sh * [CI] move kube-router tests to vagrant to avoid network connectivity issues during netchecker check * service proxy mode still fails connectivity tests so keeping it manual mode * [kube-router] account for containerd use-case
…CI (kubernetes-sigs#8434) * [calico] make vxlan encapsulation the default * don't enable ipip encapsulation by default * set calico_network_backend by default to vxlan * update sample inventory and documentation * [CI] pin default calico parameters for upgrade tests to ensure proper upgrade * [CI] improve netchecker connectivity testing * [CI] show logs for tests * [calico] tweak task name * [CI] Don't run the provisioner from vagrant since we run it in testcases_run.sh * [CI] move kube-router tests to vagrant to avoid network connectivity issues during netchecker check * service proxy mode still fails connectivity tests so keeping it manual mode * [kube-router] account for containerd use-case
…CI (kubernetes-sigs#8434) * [calico] make vxlan encapsulation the default * don't enable ipip encapsulation by default * set calico_network_backend by default to vxlan * update sample inventory and documentation * [CI] pin default calico parameters for upgrade tests to ensure proper upgrade * [CI] improve netchecker connectivity testing * [CI] show logs for tests * [calico] tweak task name * [CI] Don't run the provisioner from vagrant since we run it in testcases_run.sh * [CI] move kube-router tests to vagrant to avoid network connectivity issues during netchecker check * service proxy mode still fails connectivity tests so keeping it manual mode * [kube-router] account for containerd use-case
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
While debugging calico ebpf failures in CI for AlmaLinux 8 we discovered an issue with Alma/Centos 8.5 and
ipip
encapsulation preventing pod-to-pod communication. Upon further investigation and discussions with the Calico team, their recomandation is to usevxlan
encapsulation on newer kernels instead ofipip
due to performance reasons even though both should work in bothiptables
andebpf
mode.For now this PR changes the default to
ipip: false
disablingipip
encapsulation and we continue to investigate the reason for theipip
failure.During the development of this fix I discovered that the kubespray CI needs encapsulation for multi-node deployments and I had to enable
vxlan
for calico based CI tests.Additionally to properly troubleshoot the CI failures I had to modify the network connectivity test and actually make it fail when
netchecker
agents could not connect to thenetchecker-server
. This new change exposed connectivity issues with thekube-router
CNI which were worked around by moving these tests tovagrant
instead ofpacket
. There is still a know failure in runningkube-router
inservice proxy
mode which I was not able to fully address, given that vagrant jobs in general are allowed to tail and this test is left on manual this will not block future CI jobs.Which issue(s) this PR fixes:
Fixes #8456
Special notes for your reviewer:
Given previous experience with changing defaults such as when changing the
container_manager
tocontainerd
, I'm somewhat conflicted on the actual change and an alternative would be to just change the settings in the CI but, for now the change as is seems like the best course of action to me but I welcome input here.Does this PR introduce a user-facing change?: