Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

Closed
euank opened this issue Dec 5, 2016 · 2 comments · Fixed by #81
Closed

"unregister_netdevice" isn't necessarily a KernelDeadlock #47

euank opened this issue Dec 5, 2016 · 2 comments · Fixed by #81

Comments

@euank
Copy link
Contributor

euank commented Dec 5, 2016

I have a node running CoreOS 1221.0.0 with kernel version 4.8.6-coreos.

The node-problem-detector marked it with "KernelDeadlock True Sun, 04 Dec 2016 18:56:20 -0800 Wed, 16 Nov 2016 00:03:33 -0800 UnregisterNetDeviceIssue unregister_netdevice: waiting for lo to become free. Usage count = 1".

If I check my kernel log, I see the following:

$ dmesg -T | grep -i unregister_netdevice -C 3
[Wed Nov 16 08:02:19 2016] docker0: port 5(vethfd2807b) entered blocking state
[Wed Nov 16 08:02:19 2016] docker0: port 5(vethfd2807b) entered forwarding state
[Wed Nov 16 08:02:19 2016] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:1206 detected!
[Wed Nov 16 08:03:33 2016] unregister_netdevice: waiting for lo to become free. Usage count = 1
[Wed Nov 16 08:14:35 2016] vethafecb94: renamed from eth0
[Wed Nov 16 08:14:35 2016] docker0: port 2(veth807b9e2) entered disabled state
[Wed Nov 16 08:14:35 2016] docker0: port 2(veth807b9e2) entered disabled state

Clearly, the node managed to continue to perform operations after printing that message. In addition, pods continue to function just fine and there aren't any long-term issues for me on this node.

I know that the config of what counts as a deadlock is configurable, but perhaps the default configuration shouldn't include this, or the check should be more advanced for it, since as-is it could be quite confusing.

@adohe-zz
Copy link
Contributor

adohe-zz commented Dec 5, 2016

Found issue related on docker:

moby/moby#5618

and some useful message:

We occasionally see a handful of unregister_netdevice: waiting for lo to become free. Usage count >= 1 messages in syslog, but unlike before, the kernel does not crash and the message goes away. I >suspect that one of the other changes introduced either in the Kernel or in Docker detect this >condition and now recover from it. For us, this now makes this message annoying but no longer a >critical bug.

@Random-Liu
Copy link
Member

Random-Liu commented Dec 5, 2016

@euank Thanks for filing the issue.

Yeah, I also observed temporarily unregister_netdevice, but forgot to file issue.
I think an event is enough then. The real issue should be caught by docker hung detection.

@Random-Liu Random-Liu added the bug label Dec 15, 2016
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Jan 21, 2017
* Remove `unregister_netdevice` rule to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 1, 2017
* Remove `unregister_netdevice` rule to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 7, 2017
* Remove `unregister_netdevice` rule to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 8, 2017
* Remove `unregister_netdevice` rule to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 10, 2017
* Remove `unregister_netdevice` rule to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 10, 2017
* Change `unregister_netdevice` to be an event to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Random-Liu added a commit to Random-Liu/node-problem-detector that referenced this issue Feb 10, 2017
* Change `unregister_netdevice` to be an event to fix kubernetes#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix kubernetes#48.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants