"unregister_netdevice" isn't necessarily a KernelDeadlock #47

euank · 2016-12-05T03:01:52Z

I have a node running CoreOS 1221.0.0 with kernel version 4.8.6-coreos.

The node-problem-detector marked it with "KernelDeadlock True Sun, 04 Dec 2016 18:56:20 -0800 Wed, 16 Nov 2016 00:03:33 -0800 UnregisterNetDeviceIssue unregister_netdevice: waiting for lo to become free. Usage count = 1".

If I check my kernel log, I see the following:

$ dmesg -T | grep -i unregister_netdevice -C 3
[Wed Nov 16 08:02:19 2016] docker0: port 5(vethfd2807b) entered blocking state
[Wed Nov 16 08:02:19 2016] docker0: port 5(vethfd2807b) entered forwarding state
[Wed Nov 16 08:02:19 2016] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:1206 detected!
[Wed Nov 16 08:03:33 2016] unregister_netdevice: waiting for lo to become free. Usage count = 1
[Wed Nov 16 08:14:35 2016] vethafecb94: renamed from eth0
[Wed Nov 16 08:14:35 2016] docker0: port 2(veth807b9e2) entered disabled state
[Wed Nov 16 08:14:35 2016] docker0: port 2(veth807b9e2) entered disabled state

Clearly, the node managed to continue to perform operations after printing that message. In addition, pods continue to function just fine and there aren't any long-term issues for me on this node.

I know that the config of what counts as a deadlock is configurable, but perhaps the default configuration shouldn't include this, or the check should be more advanced for it, since as-is it could be quite confusing.

The text was updated successfully, but these errors were encountered:

adohe-zz · 2016-12-05T08:52:49Z

Found issue related on docker:

moby/moby#5618

and some useful message:

We occasionally see a handful of unregister_netdevice: waiting for lo to become free. Usage count >= 1 messages in syslog, but unlike before, the kernel does not crash and the message goes away. I >suspect that one of the other changes introduced either in the Kernel or in Docker detect this >condition and now recover from it. For us, this now makes this message annoying but no longer a >critical bug.

Random-Liu · 2016-12-05T19:04:34Z

@euank Thanks for filing the issue.

Yeah, I also observed temporarily unregister_netdevice, but forgot to file issue.
I think an event is enough then. The real issue should be caught by docker hung detection.

* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.

* Change `unregister_netdevice` to be an event to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.

Random-Liu added the bug label Dec 15, 2016

Random-Liu mentioned this issue Jan 21, 2017

Fix kernel monitor issues #81

Merged

dchen1107 closed this as completed in #81 Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

euank commented Dec 5, 2016

adohe-zz commented Dec 5, 2016

Random-Liu commented Dec 5, 2016 •

edited

Loading

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

Comments

euank commented Dec 5, 2016

adohe-zz commented Dec 5, 2016

Random-Liu commented Dec 5, 2016 • edited Loading

Random-Liu commented Dec 5, 2016 •

edited

Loading