-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding logic to restore networks in order #2571
Conversation
manager/allocator/network.go
Outdated
continue | ||
} | ||
|
||
if existingOnly && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment here stating the importance that both DriverState and IPAM are written together, else the restore logic will fail again
@@ -309,8 +309,10 @@ func (na *cnmNetworkAllocator) DeallocateService(s *api.Service) error { | |||
|
|||
// IsAllocated returns if the passed network has been allocated or not. | |||
func (na *cnmNetworkAllocator) IsAllocated(n *api.Network) bool { | |||
_, ok := na.networks[n.ID] | |||
return ok | |||
if _, ok := na.networks[n.ID]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not look needed as a change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this isn't needed, this is the same as return ok
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right right. I had added some more logic to check for allocation. I removed it and did not revert this.
|
||
if err := a.store.Batch(func(batch *store.Batch) error { | ||
for _, n := range allocatedNetworks { | ||
if err := a.commitAllocatedNetwork(ctx, batch, n); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add here a check that a network to be committed must have IPAM and DriverState both != nil, this will put more guarantees on the previous check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going skip this coz its used in 2 places where the networks are "rightly" allocated now.
Codecov Report
@@ Coverage Diff @@
## master #2571 +/- ##
==========================================
+ Coverage 61.42% 61.48% +0.06%
==========================================
Files 134 134
Lines 21722 21817 +95
==========================================
+ Hits 13342 13414 +72
- Misses 6945 6946 +1
- Partials 1435 1457 +22 |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.
For the unallocated net1, is there no way to check that the vxlan id or subnet pool being assigned was already assigned to net2 and avoid re-assigning it ? @abhi
manager/allocator/network.go
Outdated
@@ -560,6 +533,59 @@ func (a *Allocator) deallocateNode(node *api.Node) error { | |||
return nil | |||
} | |||
|
|||
// allocateNetworks allocates networks in the store so far before we process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store so far before we process watched events
Sorry, not sure what you mean by this.
Also, please add a comment for how the existingOnly flag is used.
manager/allocator/network.go
Outdated
@@ -165,6 +135,10 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) { | |||
return err | |||
} | |||
|
|||
if err := a.allocateNetworks(ctx, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment on why we do it in this order i.e. existingOnly=true first and existingOnly=false later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments are added in detail in allocateNetworks and the usage of existingOnly flags
@anshulpundir We can check and it will be a O(n^2) op and its not needed. In this case we only process networks once. |
if nc.nwkAllocator.IsAllocated(n) { | ||
continue | ||
} | ||
// Network is considered allocated only if the DriverState and IPAM are NOT nil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have explained it here
9bdbb7a
to
165404f
Compare
@anshulpundir elaborated some comments |
Got it thx for clarifying! |
manager/allocator/network.go
Outdated
@@ -560,6 +534,60 @@ func (a *Allocator) deallocateNode(node *api.Node) error { | |||
return nil | |||
} | |||
|
|||
// allocateNetworks allocates (restores) networks in the store so far before we process | |||
// watched events. existingOnly flas is set to true to specify if only allocated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flas => flag
manager/allocator/network.go
Outdated
@@ -164,6 +134,11 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) { | |||
if err := a.allocateTasks(ctx, true); err != nil { | |||
return err | |||
} | |||
// Now allocated objects that were not previously allocated | |||
// but were present in the raft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Thx for adding this, but it might make sense to add you description from the PR here in some form. Maybe: "we allocate previously unallocated objects after already handling the objects which have been previously partially allocated to first finish allocating those partially allocated objects (to avoid introducing conflicts) and then do the allocations for new objects.
This commits adds a fix for restore case where there might a mix of allocated and unallocated network in raft. During restore the allocator was going over the networks lexicographically which would mean that there might be a chance for an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a messed up state. Signed-off-by: Abhinandan <[email protected]>
This commits adds a fix for restore case where there might a mix of allocated and unallocated
network in raft. During restore the allocator was going over the networks lexicographically which would mean that there might be a chance for an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.
This commit also contains unit test and some test case modification that was not catching the issue before.
Signed-off-by: Abhinandan [email protected]