-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.10.0 pre-run hook "network" failed #6567
Comments
Hi @djenriquez and thank you for reporting this! Just wanted to let you know that this looks very similar to #6580 and I'm investigating this currently. |
Thanks @tgross. I saw it a few more times, but again, not really an issue since it seems to resolve right away on reschedule. Definitely seems to be some kind of race condition. |
Ok, so as described in #6580 (comment), it turns out this is separate from the connectivity problem described there. My theory from #6580 which I'll investigate here:
|
tl;dr There are two things to do here:
Nomad's Connect integration uses CNI plugins to create the network namespaces, the bridge network, container veth pairs, and iptables rules. In the happy path case, we configure CNI using the
The problem we're seeing is a race condition in creating iptables chains when multiple Connect-enabled task groups are starting on the same client node, which is of course the common case for folks who are trying out our Connect integration in a dev environment. In order to reproduce this, you need to flush all firewall chains/rules (it's easiest to simply restart the node). One of the allocations will fail. Then Nomad will retry the allocation a few seconds later. At that point everything proceeds as planned, and we'll never hit the race condition again on the same client node. The error messages we get back from CNI are unhelpful but I was able to determine what was happening by using Here are relevant excerpts from a log where I ran # in these logs, PID 2886 is Nomad itself
# this is the portmap execution run for task group 1
$ head -1 execsnoop.log && grep 3332 execsnoop.log
TIME(s) PCOMM PID PPID RET ARGS
2.082 portmap 3332 2886 0 "/opt/cni/bin/portmap"
2.087 iptables 3342 3332 0 "/usr/sbin/iptables --version"
2.089 iptables 3343 3332 0 "/usr/sbin/iptables -t nat -S --wait"
2.090 iptables 3345 3332 0 "/usr/sbin/iptables -t nat -N CNI-HOSTPORT-SETMARK --wait"
2.091 iptables 3346 3332 0 "/usr/sbin/iptables -t nat -C CNI-HOSTPORT-SETMARK -m comment --comment CNI portfwd masquerade mark -j MARK --set-xmark 0x2000/0x2000 --wait"
2.101 iptables 3353 3332 0 "/usr/sbin/iptables -t nat -A CNI-HOSTPORT-SETMARK -m comment --comment CNI portfwd masquerade mark -j MARK --set-xmark 0x2000/0x2000 --wait"
2.103 iptables 3356 3332 0 "/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait"
2.105 iptables 3358 3332 0 "/usr/sbin/iptables -t nat -C CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE --wait"
2.107 iptables 3359 3332 0 "/usr/sbin/iptables -t nat -A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE --wait"
...
# this is the portmap execution run for task group 2
$ head -1 execsnoop.log && grep 3334 execsnoop.log
2.082 portmap 3334 2886 0 "/opt/cni/bin/portmap"
2.089 iptables 3341 3334 0 "/usr/sbin/iptables --version"
2.091 iptables 3344 3334 0 "/usr/sbin/iptables -t nat -S --wait"
2.092 iptables 3347 3334 0 "/usr/sbin/iptables -t nat -C CNI-HOSTPORT-SETMARK -m comment --comment CNI portfwd masquerade mark -j MARK --set-xmark 0x2000/0x2000 --wait"
2.100 iptables 3352 3334 0 "/usr/sbin/iptables -t nat -A CNI-HOSTPORT-SETMARK -m comment --comment CNI portfwd masquerade mark -j MARK --set-xmark 0x2000/0x2000 --wait"
2.102 iptables 3355 3334 0 "/usr/sbin/iptables -t nat -S --wait"
2.104 iptables 3357 3334 0 "/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait"
# this portmap invocation stops here! So in this situation, we're seeing a timeline like the following:
It turns out (through a bunch of experimenting) that there are two opportunities to hit the race: one in the Unfortunately, because we're handing off control to the chain of CNI plugins at this point, there's not really a good way for us to handle this condition inside Nomad itself. We don't have enough information at the call site to determine what's happened. Any error message that bubbles up from the plugin comes from the I dug into the code for the CNI plugins and found that diff --git a/plugins/meta/firewall/iptables.go b/plugins/meta/firewall/iptables.go
index faae35c..ded69ce 100644
--- a/plugins/meta/firewall/iptables.go
+++ b/plugins/meta/firewall/iptables.go
@@ -43,7 +43,9 @@ func ensureChain(ipt *iptables.IPTables, table, chain string) error {
}
}
- return ipt.NewChain(table, chain)
+ // DEBUG(tgross): ignore errors
+ ipt.NewChain(table, chain)
+ return nil
}
func generateFilterRule(privChainName string) []string {
diff --git a/plugins/meta/portmap/chain.go b/plugins/meta/portmap/chain.go
index bca8214..129e1c0 100644
--- a/plugins/meta/portmap/chain.go
+++ b/plugins/meta/portmap/chain.go
@@ -41,9 +41,8 @@ func (c *chain) setup(ipt *iptables.IPTables) error {
return err
}
if !exists {
- if err := ipt.NewChain(c.table, c.name); err != nil {
- return err
- }
+ // DEBUG(tgross): ignore errors
+ ipt.NewChain(c.table, c.name)
}
// Add the rules to the chain I'm going to try to work this up into a production-ready patch for submission to the CNI project. Also, while the iptables rules for each task group are specific to an allocation, the chains are a client-global resource. So from the Nomad side we could create the cc @nickethier @shoenig and @schmichael as a heads up and for discussion |
I've opened containernetworking/plugins#408 for the upstream fix. |
containernetworking/plugins#408 was merged but just missed 0.8.3, so we'll want to look out for it being released in their 0.8.4 release and then update the documentation for our Connect integration. I'll keep this issue open until that's been released and we have our docs updated. |
Glad to hear, @tgross. So this fix would require an update to the CNI binary, would it require any changes to the Nomad side?
…Sent from my iPhone
On Nov 15, 2019, at 10:51 AM, Tim Gross ***@***.***> wrote:
containernetworking/plugins#408 was merged but just missed 0.8.3, so we'll want to look out for it being released in their 0.8.4 release and then update the documentation for our Connect integration.
I'll keep this issue open until that's been released and we have our docs updated.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Right. No changes to Nomad. |
https://github.com/containernetworking/plugins/releases/tag/v0.8.4 was just released. I'm going to get that into testing and then we'll update the docs to recommend that version. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Operating system and Environment details
Amazon Linux 2
Issue
An allocation being scheduled failed with:
Reproduction steps
Nothing special, an update to a job was submitted to increase its count and change some meta to force allocation recreate. The rest of the allocations scheduled fine, just saw this one. It did reschedule just fine on the same host, just figured this should be logged.
Nomad Client logs (if appropriate)
More logs sent via email.
The text was updated successfully, but these errors were encountered: