Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to restore networks in order #2571

Merged
merged 1 commit into from
Mar 29, 2018
Merged

Adding logic to restore networks in order #2571

merged 1 commit into from
Mar 29, 2018

Conversation

abhi
Copy link
Contributor

@abhi abhi commented Mar 23, 2018

This commits adds a fix for restore case where there might a mix of allocated and unallocated
network in raft. During restore the allocator was going over the networks lexicographically which would mean that there might be a chance for an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.
This commit also contains unit test and some test case modification that was not catching the issue before.

Signed-off-by: Abhinandan [email protected]

@abhi
Copy link
Contributor Author

abhi commented Mar 23, 2018

continue
}

if existingOnly &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment here stating the importance that both DriverState and IPAM are written together, else the restore logic will fail again

@@ -309,8 +309,10 @@ func (na *cnmNetworkAllocator) DeallocateService(s *api.Service) error {

// IsAllocated returns if the passed network has been allocated or not.
func (na *cnmNetworkAllocator) IsAllocated(n *api.Network) bool {
_, ok := na.networks[n.ID]
return ok
if _, ok := na.networks[n.ID]; !ok {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look needed as a change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this isn't needed, this is the same as return ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right right. I had added some more logic to check for allocation. I removed it and did not revert this.


if err := a.store.Batch(func(batch *store.Batch) error {
for _, n := range allocatedNetworks {
if err := a.commitAllocatedNetwork(ctx, batch, n); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add here a check that a network to be committed must have IPAM and DriverState both != nil, this will put more guarantees on the previous check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going skip this coz its used in 2 places where the networks are "rightly" allocated now.

@codecov
Copy link

codecov bot commented Mar 23, 2018

Codecov Report

Merging #2571 into master will increase coverage by 0.06%.
The diff coverage is 62.85%.

@@            Coverage Diff             @@
##           master    #2571      +/-   ##
==========================================
+ Coverage   61.42%   61.48%   +0.06%     
==========================================
  Files         134      134              
  Lines       21722    21817      +95     
==========================================
+ Hits        13342    13414      +72     
- Misses       6945     6946       +1     
- Partials     1435     1457      +22

@dperny
Copy link
Collaborator

dperny commented Mar 26, 2018

LGTM

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unallocated network say net1 o be allocated the same vxlan id or subnet pool that was allocated to another networki net2. Because of this during restoring, when allocator tries to allocate the reallocate network net2, it would fail because it allocated network resources to net1 during restore. This would mean services,tasks and network itself would be in a
messed up state.

For the unallocated net1, is there no way to check that the vxlan id or subnet pool being assigned was already assigned to net2 and avoid re-assigning it ? @abhi

@@ -560,6 +533,59 @@ func (a *Allocator) deallocateNode(node *api.Node) error {
return nil
}

// allocateNetworks allocates networks in the store so far before we process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store so far before we process watched events

Sorry, not sure what you mean by this.

Also, please add a comment for how the existingOnly flag is used.

@@ -165,6 +135,10 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) {
return err
}

if err := a.allocateNetworks(ctx, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on why we do it in this order i.e. existingOnly=true first and existingOnly=false later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are added in detail in allocateNetworks and the usage of existingOnly flags

@abhi
Copy link
Contributor Author

abhi commented Mar 26, 2018

@anshulpundir We can check and it will be a O(n^2) op and its not needed. In this case we only process networks once.

if nc.nwkAllocator.IsAllocated(n) {
continue
}
// Network is considered allocated only if the DriverState and IPAM are NOT nil.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have explained it here

@abhi abhi force-pushed the master branch 3 times, most recently from 9bdbb7a to 165404f Compare March 27, 2018 16:49
@abhi
Copy link
Contributor Author

abhi commented Mar 27, 2018

@anshulpundir elaborated some comments

@anshulpundir
Copy link
Contributor

We can check and it will be a O(n^2) op and its not needed. In this case we only process networks once.

Got it thx for clarifying!

@@ -560,6 +534,60 @@ func (a *Allocator) deallocateNode(node *api.Node) error {
return nil
}

// allocateNetworks allocates (restores) networks in the store so far before we process
// watched events. existingOnly flas is set to true to specify if only allocated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flas => flag

@@ -164,6 +134,11 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) {
if err := a.allocateTasks(ctx, true); err != nil {
return err
}
// Now allocated objects that were not previously allocated
// but were present in the raft.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Thx for adding this, but it might make sense to add you description from the PR here in some form. Maybe: "we allocate previously unallocated objects after already handling the objects which have been previously partially allocated to first finish allocating those partially allocated objects (to avoid introducing conflicts) and then do the allocations for new objects.

This commits adds a fix for restore case where
there might a mix of allocated and unallocated
network in raft. During restore the allocator
was going over the networks lexicographically which
would mean that there might be a chance for an
unallocated network say net1 o be allocated the same vxlan
id or subnet pool that was allocated to another networki net2.
Because of this during restoring, when allocator tries
to allocate the reallocate network net2, it would fail
because it allocated network resources to net1 during restore.
This would mean services,tasks and network itself would be in a
messed up state.

Signed-off-by: Abhinandan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants