-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overlay fix for transient IP reuse #1935
Conversation
|
||
if err := d.checkEncryption(nid, nil, 0, true, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkEncryption is done inside the peerDelete
@@ -1090,15 +1060,6 @@ func (n *network) contains(ip net.IP) bool { | |||
return false | |||
} | |||
|
|||
func (n *network) getSubnetforIPAddr(ip net.IP) *subnet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was unused
5d228bb
to
89b266e
Compare
Codecov Report
@@ Coverage Diff @@
## master #1935 +/- ##
=========================================
Coverage ? 37.88%
=========================================
Files ? 137
Lines ? 27325
Branches ? 0
=========================================
Hits ? 10351
Misses ? 15703
Partials ? 1271
Continue to review full report at Codecov.
|
ping @mavenugo @abhinandanpb PTAL; this one should be fixing various stability issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits, and there's some code churn between commits, but would like @mavenugo or @abhinandanpb to have a look as they are probably more familiar with the code 😄
drivers/overlay/joinleave.go
Outdated
if err := d.checkEncryption(nid, nil, 0, true, false); err != nil { | ||
logrus.Warn(err) | ||
} | ||
// if err := d.checkEncryption(nid, nil, 0, true, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this code instead of commenting it out in this commit?
drivers/overlay/joinleave.go
Outdated
@@ -232,13 +233,9 @@ func (d *driver) Leave(nid, eid string) error { | |||
} | |||
} | |||
|
|||
n.leaveSandbox() | |||
|
|||
// if err := d.checkEncryption(nid, nil, 0, true, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this code should probably be in the previous commit (see my other comment) 😄
@@ -68,7 +68,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo, | |||
|
|||
ep.ifName = containerIfName | |||
|
|||
if err := d.writeEndpointToStore(ep); err != nil { | |||
if err = d.writeEndpointToStore(ep); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you changed these? Overall I think it's clearer to just create a new variable here, as the error is not used outside the if. (It makes the code a bit easier to "grasp", because you know the result is not used outside of the if 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed it as a result of a warning from one of the code checker tools, I can change it back but I also saw that in some other places the proper outer err was never set, so shadowing is risky
osl/neigh_linux.go
Outdated
@@ -188,7 +188,7 @@ func (n *networkNamespace) AddNeighbor(dstIP net.IP, dstMac net.HardwareAddr, fo | |||
n.Lock() | |||
n.neighbors = append(n.neighbors, nh) | |||
n.Unlock() | |||
logrus.Debugf("Neighbor entry added for IP %v, mac %v", dstIP, dstMac) | |||
logrus.Debugf("Neighbor entry added for IP %v, mac %v on ifc:%s", dstIP, dstMac, nh.linkName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after ifc:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I print field especially names or ids, I prefer to not put space between the : in order to make it clear if there is some unprintable or transparent char. IMO the delimiter to help readability is actually the ':'. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would quoting work? %q
? Point is that it's not done like that in many places, so if would still be ambiguous probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah realigning the whole project I think is an endless work, maybe I can update all the fields of this log to be consistent at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a huge issue, just noticed while looking :)
drivers/overlay/joinleave.go
Outdated
@@ -224,6 +224,7 @@ func (d *driver) Leave(nid, eid string) error { | |||
return types.InternalMaskableErrorf("could not find endpoint with id %s", eid) | |||
} | |||
|
|||
logrus.Errorf("The channel is valid:%t", d.notifyCh != nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is a bit odd "the channel is valid"; should this be "invalid"? Also missing a space after :
.
Consider using just Error
instead of Errorf
, because printing booleans doesn't need formatting;
logrus.Error("The channel is valid: ", d.notifyCh != nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry my bad, this is an error logs that I put for debug, I will remove it
drivers/overlay/ov_network.go
Outdated
// When we get a L3 miss check if its a valid peer and reprogram the neighbor | ||
// entry again. Rate limit it to once attempt every 500ms, just in case a faulty | ||
// container sends a flood of packets to invalid peers | ||
if !l3Miss { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're removing this check in this commit, but adding it again in the last commit?
drivers/overlay/ov_network.go
Outdated
logrus.Warnf("miss notification for peer:%+v l3Miss:%t l2Miss:%t, if the problem persist check the gc_thresholds", neigh, l3Miss, l2Miss) | ||
pKey, pEntry, err := n.driver.peerDbSearch(n.id, ip) | ||
if !pEntry.isLocal { | ||
logrus.Warnf("miss notification for peer:%+v l3Miss:%t l2Miss:%t, if the problem persist check the gc_thresh on the host pKey:%+v pEntry:%+v err:%v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add spaces after the colons here as well? (peer: %+v
etc)
drivers/overlay/ov_network.go
Outdated
if time.Since(t) > 500*time.Millisecond { | ||
// The time limit here is to guarantee that the dbSearch is not | ||
// done too frequently causing a stall of the peerDB operations. | ||
if l3Miss && time.Since(t) > 500*time.Millisecond { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The l3Miss
check was removed in an earlier commit, and already had a description for the 500ms; perhaps restore that check and description to keep the diff smaller
if i != 1 { | ||
// Transient case, there is more than one endpoint that is using the same IP,MAC pair | ||
s, _ := pMap.mp.String(pKey.String()) | ||
logrus.Warnf("peerDbAdd transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a space after the colons here (for readability)?
if i != 0 { | ||
// Transient case, there is more than one endpoint that is using the same IP,MAC pair | ||
s, _ := pMap.mp.String(pKey.String()) | ||
logrus.Warnf("peerDbDelete transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a space after the colons here (for readability)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same for other errors)
drivers/overlay/ov_network.go
Outdated
// Entries which are marked as permanent are never deleted by the garbage-collector. | ||
// The time limit here is to guarantee that the dbSearch is not | ||
// done too frequently causing a stall of the peerDB operations. | ||
if l3Miss && time.Since(t) > 500*time.Millisecond { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that we could combine the else
and if
into else if
.
Tiny point. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do
57a9315
to
dc36fcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this PR expected to be part of an upcoming release, or there's no timeline on it yet?
@nishanttotla I would like so, but still stuck on code review. |
@fcrisciani will review this today |
@abhinandanpb ping :) |
drivers/overlay/ov_network.go
Outdated
if time.Since(t) > 500*time.Millisecond { | ||
t = time.Now() | ||
n.programNeighbor(ip) | ||
} else if l3Miss && time.Since(t) > 500*time.Millisecond { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is existing code, I think 500ms is aggressive. we can even double it to 1second. wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is just a warning message without any action, we can even increase this timer to avoid excessive logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure can bump to 1sec
|
||
// If there is still an entry into the database and the deletion went through without errors means that there is now no | ||
// configuration active in the kernel. | ||
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its guaranteed to be one or more. Correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep 4 lines above the if dbEntries == 0 { return nil }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I have one concern about the scenario I have mentioned in the comment.
Alternate thought not necessary to be implemented: Should we treat this problem as a classic networking mac move scenario. If we detect a mac move , unicast arp for it ?
// If there is still an entry into the database and the deletion went through without errors means that there is now no | ||
// configuration active in the kernel. | ||
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one | ||
peerKey, peerEntry, err := d.peerDbSearch(nid, peerIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it have more than 1 entry for the same IP if there is continuous container add deletes ? From what I see of the internal map set implementation its a map of maps. https://github.com/deckarep/golang-set/blob/master/threadunsafe.go#L36. If thats the case the order will not be honored. Would that be a problem ? How is the peerDbdelete going to handle that ? For eg lets have a node n100,
C1 is on n1 , fdb on n100 will have c1 against 10.1.1.1, n1
C1 appears on n2 , fdb will have c1 still against 10.1.1.1 on n1 , peerdb mapset cache will have this new entry until the previous entry is deleted
now C1 cppears on n3 , fdb might still have the transient state against 10.1.1.1 on n1 so the peerdb cache will have 2 entries.
At this point we will receive the delete for the entry against n1. Now the peerdb search will fetch one of them ? For our scenario lets say we pick the entry for C1 pointing to n3. Now if we get a delete for the entry C1 pointing to n2 - how would that proceed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course we are considering a transient period that should converge "pretty quickly" (according to the timing of the networkdb in delivering notification). The idea here is:
- push in the kernel the first configuration
- if any other notification arrives for the same IP, save it in the Set
- once the delete notification arrives if it deletes the current configuration in the kernel, take the next one available and push it to the kernel (note this can still be not the final one)
- the expected state at the end is to have only 1 entry in the set and that is the one that is also configured in the kernel.
For your example let's say that you receive the delete for the entry C1, n2, then nothing happens, remove it from the set and contine.
After that you receive the delete for C1, n100, that is the entry configured in the kernel, then the kernel state is going to be purged and the next entry is configured, in this case C1, n3.
The set now contains only 1 element that is actually the one configured and the only real incarnation of the container C1 with the correct peer destination
drivers/overlay/peerdb.go
Outdated
eid string | ||
vtep string | ||
peerIPMaskOnes int | ||
peerIPMaskBits int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is peerIPMask, a net.IPMask
converted to couple of int
, while the vtep net.IP
converted to String ?
Isnt it easy to just marshal peerIPMask
as a String and that makes it consistent and readable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only convenience, for the net.IPMask looks like the best way to initialize it back is using that decomposition. Ref: https://golang.org/pkg/net/#IPMask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked again and looks like this can be converted to string and use the ParseCIDR that will return the IPNet that contains the IPMask object internally, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani am done with my review. Pls address the comments before we can merge this PR.
In case of IP reuse locally there was a race condition that was leaving the overlay namespace with wrong configuration causing connectivity issues. This commit introduces the use of setMatrix to handle the transient state and make sure that the proper configuration is maintained Signed-off-by: Flavio Crisciani <[email protected]>
peerDB was never being flushed on network delete leaveing behind stale entries Signed-off-by: Flavio Crisciani <[email protected]>
Signed-off-by: Flavio Crisciani <[email protected]>
Signed-off-by: Flavio Crisciani <[email protected]>
Avoid error logs in case of local peer case, there is no need for deleteNeighbor Avoid the network leave to readvertise already deleted entries to upper layer Signed-off-by: Flavio Crisciani <[email protected]>
dc36fcb
to
d93b9b0
Compare
drivers/overlay/peerdb.go
Outdated
// was never been configured into the kernel (we allow only 1 configuration at the time per <ip,mac> mapping) | ||
return nil | ||
// Local peers do not have any local configuration to delete | ||
if !localPeer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body internally is only indented in the new if
condition
@fcrisciani thanks for addressing the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When will this fix be released? |
And will it be backported to 17.03? We found 17.06 fairly flaky and 17.09 is pretty new. |
@kleptog Docker CE (Community Edition) 17.03 has reached end of life (Docker CE 17.06 soon); Docker EE (Enterprise Edition) has a longer support duration (12 Months currently), but I don't know if this is planned to be backported. It's also possible that 17.03 does not have the same issue (may have the same effect, but could be a different cause) |
It is possible that for a limited period of time the same IP and Mac address are used at the same time.
The previous logic was not able to handle the condition creating connectivity issues that were lasting up to 5 minutes.
Fixes: #1934