Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables contention when creating MASQUERADE rule #988

Closed
dtshepherd opened this issue Apr 25, 2018 · 7 comments · Fixed by #1529
Closed

iptables contention when creating MASQUERADE rule #988

dtshepherd opened this issue Apr 25, 2018 · 7 comments · Fixed by #1529

Comments

@dtshepherd
Copy link
Contributor

Random flannel pods keep locking up when trying to ensure the MASQUERADE iptables rule is in place.

Expected Behavior

Flannel should retry if adding the iptables rule fails or iptables returns an error code.

Current Behavior

Flannel stops working on the node with iptables contention until the pod is forcefully restart...

I0424 21:18:09.062732       1 main.go:488] Using interface with name em1 and address 172.16.1.103
I0424 21:18:09.062906       1 main.go:505] Defaulting external address to interface address (172.16.1.103)
I0424 21:18:11.078235       1 kube.go:131] Waiting 10m0s for node controller to sync
I0424 21:18:11.078278       1 kube.go:294] Starting kube subnet manager
I0424 21:18:12.078449       1 kube.go:138] Node controller sync successful
I0424 21:18:12.078497       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - engine-2.xxx.yyy
I0424 21:18:12.078504       1 main.go:238] Installing signal handlers
I0424 21:18:12.078643       1 main.go:353] Found network config - Backend type: vxlan
I0424 21:18:12.078698       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I0424 21:18:12.079450       1 main.go:300] Wrote subnet file to /run/flannel/subnet.env
I0424 21:18:12.079471       1 main.go:304] Running backend.
I0424 21:18:12.079484       1 main.go:322] Waiting for all goroutines to exit
I0424 21:18:12.079516       1 vxlan_network.go:60] watching for new subnet leases
E0424 21:47:24.568958       1 iptables.go:97] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 172.20.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --wait]: exit status 4: iptables: Resource temporarily unavailable.

Possible Solution

Maybe #935 helps workaround the problem, however, flannel doesn't have a release that includes this fix yet. Also, it would be nice if flannel retried with exponential backoff to check/ensure iptables rule is in place. Right now, it seems like once it fails, flannel on that specific kubernetes node won't recover.

Steps to Reproduce (for bugs)

Not easily reproducible as it seems to be a race condition. Maybe create a 2nd process that is also locking/modifying tables fairly often?

Context

Reliably recover flannel network without user intervention. Isolated clusters need to self-recover without manually deleting pods.

Your Environment

  • Flannel version: 0.10.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.1.0
  • Kubernetes version (if used): 1.9.2
  • Operating System and version: 7.3.1611
@zq-david-wang
Copy link

zq-david-wang commented Jul 21, 2018

The log indicates that flanneld is being killed/stopped somehow, which would cause some goroutines return something strange. I do not think it is about iptables. Most likely, your pod was killed during startup

I0424 21:18:12.079484 1 main.go:322] Waiting for all goroutines to exit
I0424 21:18:12.079516 1 vxlan_network.go:60] watching for new subnet leases
E0424 21:47:24.568958 1 iptables.go:97] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 172.20.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --wait]: exit status 4: iptables: Resource temporarily unavailable.

Base on the timestamps, the error message showed up about 30min after flanneld started to shutdown, in my experience this could indicates resource shortage on your system. (e.g. fork bomb)

@dtshepherd
Copy link
Contributor Author

There are no error messages in syslog indicating a resource shortage and everything else in the system seems to be functional. The flannel daemonset pods are still running as expected. Once that error shows up, flannel never recovers without bouncing the pods. We have some other code that is managing iptables rules and suspect both it and flannel are trying to make changes at the same time (hence, why I pointed to #935 as a workaround).

@zq-david-wang
Copy link

@dtshepherd sorry about my unthoughtful statements above. I run flannel via systemd, and the log "Waiting for all goroutines to exit" would not be printed out until flanneld break out its loop, and if flanneld is killed right after it started, I always notice error logs about 'iptables'. I rushed to a conclusion that what I experienced is similar to yours. But after checking with code, it turned out that when --kube-subnet-mgr is used, the code for lease monitoring loop is skipped and that line of log would always printed out....

Checking with flannel code, it seems that it would retry every "iptables-resync" seconds, defult is 5s
https://github.com/coreos/flannel/blob/8a083a890a4820fe97fa315dc1ecaa739c1d14db/network/iptables.go#L94-L100

	for {
		// Ensure that all the iptables rules exist every 5 seconds
		if err := ensureIPTables(ipt, rules); err != nil {
			log.Errorf("Failed to ensure iptables rules: %v", err)
		}


		time.Sleep(time.Duration(resyncPeriod) * time.Second)

Sorry again... Just tried to help....

@dtshepherd
Copy link
Contributor Author

No problem! I thought it should retry as well, but the pod becomes hung and doesn't do anything. I haven't had a chance to dig into the code as to why it isn't working.

@cehoffman
Copy link

We saw similar problems with what I wouldn't consider a large cluster yet, but one that had fairly frequent pod activity changing the service ip landscape.

In our investigation, we found there were a few core kubernetes services for our cluster that had bad pod mount configuration, kube-proxy and calico, and a general incompatibility between the host iptables command and the one inside the flannel container.

We run Container Linux as our host OS. In the most recent version, 1967.3.0, the host iptables is very outdated. It has version 1.4.21 released in 2013 with the first iteration of locking that used unix domain sockets 1. Later in 2015 this locking mechanism changed to the current flock using /run/xtables.lock 2. This means if you have any host based iptables configuration, it cannot work reliably together with the iptables in the flannel, calico, or hyperkube base images because they all use the more recent flock mechanism.

Our main issue came down to missing mounts for /run/xtables.lock in our kube-proxy deployment. This was the deployment configuration that came from Tectonic standard. We updated that and have not seen the referenced error from flannel.

@dtshepherd
Copy link
Contributor Author

@cehoffman Wow good catch, we are using hyperkube and I was just looking at our kube-proxy manifest and it does not have the /run/xtables.lock mount point. When I find some time, I'll try that configuration to see if it fixes the problem. For now, we've been running with weave...

@dtshepherd
Copy link
Contributor Author

I'm pretty sure the YAML for installing flannel needs to be updated to include the /run/xtables.lock mount: https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml#L212-L226

It appears kube-proxy has the mount so it should handle any iptables contention with the host, but the flannel DaemonSet fails to mount the same file.

dtshepherd pushed a commit to dtshepherd/flannel that referenced this issue Dec 5, 2019
This prevents iptables contention with kube-proxy and the host OS.

Fixes flannel-io#988.
manuelbuil pushed a commit to manuelbuil/flannel that referenced this issue Jan 24, 2022
This prevents iptables contention with kube-proxy and the host OS.

Fixes flannel-io#988.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants