-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico unconditionally drops VXLAN packets that are not related to Calico #6752
Comments
Hey @yoheiueda , I'm trying to understand your use case a bit better here, but why do you need a second VXLAN tunnel? I don't think it's a use case we normally see so trying to get a better grasp on your issue. |
I am working on the development of a new mechanism called Peer Pod VMs. With this mechanism, a new VM instance is created per a pod, and a VXLAN tunnel is established between the worker node VM and the new VM instance for the pod. Please see this architecture diagram. |
This behaviour was a security fix - one way you might try to inject packets into a (vxlan) pod network would be to send vxlan packets direct to a node, so I don't think we can back it out completely. However, I guess it might be possible to make the rule more specific - only dropping vxlan packets that match the VNI that Calico's using? A possible option without a code change - as the rule says, it drops packets from host IPs it doesn't know about. It may be possible to add your IPs to the list? Using the |
It wasn't clear to me from your diagram if the VXLAN packets were coming from a pod or not. If so, you might use |
@lwr20 Thank you very much for the info. I'll try
Yes, I understand the purpose of the feature. I am saying that the commit cce5446 I mentioned previously was not a security fix or a security enhancement. Each VXLAN tunnel has a VXLAN ID (VNI), so we can distinguish multiple VXLAN tunnels in a single system. Calico uses 4096 for VXLAN ID, so VXLAN tunnels other than 4096 are not relevant to Calico. The original Calico implementation respected VXLAN ID, and did not interfere VXLAN packets that are not relevant to Calico. The commit cce5446 has removed the condition that checks VXLAN IDs, and unconditionally drops VXLAN packets that are not relevant to Calico. I think this behavior is overkill, and should be fixed not to interfere VXLAN packets that are not relevant to Calico.
From the viewpoint from Calico, VXLAN packets are not coming from a pod. VXLAN tunnels used in my project is not relevant to Calico network, so Calico should not interfere them. |
Doesn't change any of your points, but just to note that this is configurable. |
Yes, I think we should do better here. The linked commit wasn't a security fix, although the dropping of VXLAN packets from unknown hosts is intended for security reasons. The linked commit was to fix an issue where we were using an iptables match criteria that was causing crashes on a number of systems. We should reinstate it behind feature auto-detection and/or a configuration option in order to enable use-cases like this one with multiple VXLAN networks. I believe Felix already has a featuregate / auto-detection mechanism we might be able to leverage for this. |
I've been working on a fix for moby/moby#43382 and noticed the mention link back to here. RedHat decided to deprecate the |
I'd like to suggest that this is reclassified as a bug, not an enhancement - I just spent several hours tracking down why our VXLAN setup (separate from Calico) is broken in one deployment, and it turns out that that deployment has a newer Calico version which now drops all the packets. To me, this seems like a clear regression, even though it might not affect most typical setups. |
Yep. I think that is fair. Reclassified. |
If anyone wants to take a stab at reverting that commit. and adding a feature toggle for whether or not the offending match is used, I'd be very happy to review. |
I'm interested in this. In a summary, I think these are two ways:
Which one is better? I prefer to option 2, HDYT? |
Just an FYI, @corhere's fix has been merged moby/moby@105b983, and it looks like |
@yoheiueda that's true, but only to avoid switching up required dependencies on unaffected systems in a security patch release. I am currently cooking up a moby/moby PR which drops the |
Looks like nftables has a native vxlan match: https://manpages.debian.org/unstable/nftables/nftables.8.en.html#VXLAN_HEADER_EXPRESSION |
The problem with Rh is that they are also in a process of dropping iptables and fully switchin gto nftables, which do not support the bpf programs. So it would be a short lived fixed for RH :-( |
Sounds like using the native nftables match and enabling that is the right path forward for now. |
I set up a Kubernetes cluster using Calico with the following configuration.
I also configured a VXLAN tunnel with the same port number but a different VXLAN ID.
Then, Calico drops all packets for the second VXLAN tunnel, even though I am using a different VXLAN ID for it.
Expected Behavior
Calico does not interfere packets for the second VXLAN tunnel.
Current Behavior
Calico drops all packets for the second VXLAN tunnel, since Calico adds the following iptables rule.
This rule is generated here.
calico/felix/rules/static.go
Lines 243 to 249 in c041eec
Possible Solution
Revert cce5446 if possible. The commit was introduced to fix errors that occur when the
u32
extension is used withiptables-nft
.According to this comment moby/moby#43382 (comment),
iptables-nft
also supports theu32
extension, when thext_u32
kernel module is loaded. This kernel module is available by Default in Ubuntu, and available in thekernel-modules-extra
package of RHEL.If revering the commit is not possible, providing it as an optional feature is another possible solution.
Steps to Reproduce (for bugs)
ip
command at a worker nodeContext
Your Environment
The text was updated successfully, but these errors were encountered: