iptables contention when creating MASQUERADE rule #988

dtshepherd · 2018-04-25T01:57:06Z

Random flannel pods keep locking up when trying to ensure the MASQUERADE iptables rule is in place.

Expected Behavior

Flannel should retry if adding the iptables rule fails or iptables returns an error code.

Current Behavior

Flannel stops working on the node with iptables contention until the pod is forcefully restart...

I0424 21:18:09.062732       1 main.go:488] Using interface with name em1 and address 172.16.1.103
I0424 21:18:09.062906       1 main.go:505] Defaulting external address to interface address (172.16.1.103)
I0424 21:18:11.078235       1 kube.go:131] Waiting 10m0s for node controller to sync
I0424 21:18:11.078278       1 kube.go:294] Starting kube subnet manager
I0424 21:18:12.078449       1 kube.go:138] Node controller sync successful
I0424 21:18:12.078497       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - engine-2.xxx.yyy
I0424 21:18:12.078504       1 main.go:238] Installing signal handlers
I0424 21:18:12.078643       1 main.go:353] Found network config - Backend type: vxlan
I0424 21:18:12.078698       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I0424 21:18:12.079450       1 main.go:300] Wrote subnet file to /run/flannel/subnet.env
I0424 21:18:12.079471       1 main.go:304] Running backend.
I0424 21:18:12.079484       1 main.go:322] Waiting for all goroutines to exit
I0424 21:18:12.079516       1 vxlan_network.go:60] watching for new subnet leases
E0424 21:47:24.568958       1 iptables.go:97] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 172.20.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --wait]: exit status 4: iptables: Resource temporarily unavailable.

Possible Solution

Maybe #935 helps workaround the problem, however, flannel doesn't have a release that includes this fix yet. Also, it would be nice if flannel retried with exponential backoff to check/ensure iptables rule is in place. Right now, it seems like once it fails, flannel on that specific kubernetes node won't recover.

Steps to Reproduce (for bugs)

Not easily reproducible as it seems to be a race condition. Maybe create a 2nd process that is also locking/modifying tables fairly often?

Context

Reliably recover flannel network without user intervention. Isolated clusters need to self-recover without manually deleting pods.

Your Environment

Flannel version: 0.10.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.1.0
Kubernetes version (if used): 1.9.2
Operating System and version: 7.3.1611

The text was updated successfully, but these errors were encountered:

zq-david-wang · 2018-07-21T11:08:57Z

The log indicates that flanneld is being killed/stopped somehow, which would cause some goroutines return something strange. I do not think it is about iptables. Most likely, your pod was killed during startup

I0424 21:18:12.079484 1 main.go:322] Waiting for all goroutines to exit
I0424 21:18:12.079516 1 vxlan_network.go:60] watching for new subnet leases
E0424 21:47:24.568958 1 iptables.go:97] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 172.20.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --wait]: exit status 4: iptables: Resource temporarily unavailable.

Base on the timestamps, the error message showed up about 30min after flanneld started to shutdown, in my experience this could indicates resource shortage on your system. (e.g. fork bomb)

dtshepherd · 2018-07-21T14:03:46Z

There are no error messages in syslog indicating a resource shortage and everything else in the system seems to be functional. The flannel daemonset pods are still running as expected. Once that error shows up, flannel never recovers without bouncing the pods. We have some other code that is managing iptables rules and suspect both it and flannel are trying to make changes at the same time (hence, why I pointed to #935 as a workaround).

zq-david-wang · 2018-07-21T14:48:02Z

@dtshepherd sorry about my unthoughtful statements above. I run flannel via systemd, and the log "Waiting for all goroutines to exit" would not be printed out until flanneld break out its loop, and if flanneld is killed right after it started, I always notice error logs about 'iptables'. I rushed to a conclusion that what I experienced is similar to yours. But after checking with code, it turned out that when --kube-subnet-mgr is used, the code for lease monitoring loop is skipped and that line of log would always printed out....

Checking with flannel code, it seems that it would retry every "iptables-resync" seconds, defult is 5s
https://github.com/coreos/flannel/blob/8a083a890a4820fe97fa315dc1ecaa739c1d14db/network/iptables.go#L94-L100

	for {
		// Ensure that all the iptables rules exist every 5 seconds
		if err := ensureIPTables(ipt, rules); err != nil {
			log.Errorf("Failed to ensure iptables rules: %v", err)
		}


		time.Sleep(time.Duration(resyncPeriod) * time.Second)

Sorry again... Just tried to help....

dtshepherd · 2018-07-21T14:49:38Z

No problem! I thought it should retry as well, but the pod becomes hung and doesn't do anything. I haven't had a chance to dig into the code as to why it isn't working.

cehoffman · 2019-01-10T19:32:14Z

We saw similar problems with what I wouldn't consider a large cluster yet, but one that had fairly frequent pod activity changing the service ip landscape.

In our investigation, we found there were a few core kubernetes services for our cluster that had bad pod mount configuration, kube-proxy and calico, and a general incompatibility between the host iptables command and the one inside the flannel container.

We run Container Linux as our host OS. In the most recent version, 1967.3.0, the host iptables is very outdated. It has version 1.4.21 released in 2013 with the first iteration of locking that used unix domain sockets 1. Later in 2015 this locking mechanism changed to the current flock using /run/xtables.lock 2. This means if you have any host based iptables configuration, it cannot work reliably together with the iptables in the flannel, calico, or hyperkube base images because they all use the more recent flock mechanism.

Our main issue came down to missing mounts for /run/xtables.lock in our kube-proxy deployment. This was the deployment configuration that came from Tectonic standard. We updated that and have not seen the referenced error from flannel.

dtshepherd · 2019-01-10T20:30:19Z

@cehoffman Wow good catch, we are using hyperkube and I was just looking at our kube-proxy manifest and it does not have the /run/xtables.lock mount point. When I find some time, I'll try that configuration to see if it fixes the problem. For now, we've been running with weave...

dtshepherd · 2019-12-05T03:21:29Z

I'm pretty sure the YAML for installing flannel needs to be updated to include the /run/xtables.lock mount: https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml#L212-L226

It appears kube-proxy has the mount so it should handle any iptables contention with the host, but the flannel DaemonSet fails to mount the same file.

This prevents iptables contention with kube-proxy and the host OS. Fixes flannel-io#988.

This was referenced Apr 10, 2019

Fix racy iptables between kube-proxy and CNI provider rancher/rke#1281

Merged

Flannel fails after upgrade because of lock: iptables: Resource temporarily unavailable. rancher/rancher#18637

Closed

dtshepherd pushed a commit to dtshepherd/flannel that referenced this issue Dec 5, 2019

Add /run/xtables.lock mount to kube-flannel.yml

4d7808d

This prevents iptables contention with kube-proxy and the host OS. Fixes flannel-io#988.

dtshepherd mentioned this issue Dec 5, 2019

Add /run/xtables.lock mount to kube-flannel.yml #1229

Closed

3 tasks

prameshj mentioned this issue Aug 28, 2020

NodeLocal DNS container hung on SIGTERM kubernetes/dns#394

Closed

manuelbuil pushed a commit to manuelbuil/flannel that referenced this issue Jan 24, 2022

Add /run/xtables.lock mount to kube-flannel.yml

5f27126

This prevents iptables contention with kube-proxy and the host OS. Fixes flannel-io#988.

manuelbuil mentioned this issue Jan 24, 2022

Add /run/xtables.lock mount to kube-flannel.yml #1529

Merged

3 tasks

vadorovsky closed this as completed in #1529 Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iptables contention when creating MASQUERADE rule #988

iptables contention when creating MASQUERADE rule #988

dtshepherd commented Apr 25, 2018

zq-david-wang commented Jul 21, 2018 •

edited

Loading

dtshepherd commented Jul 21, 2018

zq-david-wang commented Jul 21, 2018

dtshepherd commented Jul 21, 2018

cehoffman commented Jan 10, 2019

dtshepherd commented Jan 10, 2019

dtshepherd commented Dec 5, 2019

iptables contention when creating MASQUERADE rule #988

iptables contention when creating MASQUERADE rule #988

Comments

dtshepherd commented Apr 25, 2018

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

zq-david-wang commented Jul 21, 2018 • edited Loading

dtshepherd commented Jul 21, 2018

zq-david-wang commented Jul 21, 2018

dtshepherd commented Jul 21, 2018

cehoffman commented Jan 10, 2019

dtshepherd commented Jan 10, 2019

dtshepherd commented Dec 5, 2019

zq-david-wang commented Jul 21, 2018 •

edited

Loading