-
Notifications
You must be signed in to change notification settings - Fork 678
watch for Kubernetes node delete events and reclaim removed peers IP space on delete event #3399
Conversation
One thought: triggering a reclaim on all peers immediately will give a "thundering herd" effect with lots of contention over the ConfigMap we use as a lock. Maybe have each peer sleep a random number of seconds before beginning, to avoid this? |
Agree. I will add random delay. |
Following up on the comments from the slack conversation
@brb Are you suggesting to move |
Just the reclaim parts from kube-utils. It shouldn't require a lot of effort - move to a separate pkg and start the reclaiming process in a goroutine. |
Chatting with Murali IRL yesterday, I felt the sidecar idea is bad because we will have to ask users to fetch two logs for troubleshooting. So, right now, I'm ok with running it as a background process from the |
If so, then we need to ask k8s to pass |
Can you expand? We already run stuff in the background in that shell script, and I hadn't noticed an issue. |
The |
That identifies a problem where the parent dies and leaves the child. Since in our case the parent is the main process of the container, if it dies the entire container will go away. Or did I miss something? |
If the shell process (parent) is terminated, the child will receive |
At least in the context of the function this background process does, graceful shutdown is not critical |
865fde7
to
19e26fa
Compare
I will add Testing done: On AWS cluster, deleted couple of instances in ASG and verified on node delete IP space allocated for the node is reclaimed for all deleted nodes. Let me know if its worth to write an integration test by doing |
Thanks for the PR.
I still think that we should do it properly and run the reclaimer from the |
prog/kube-utils/main.go
Outdated
common.Log.Debugln("registering for updates for node delete events") | ||
nodeInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ | ||
DeleteFunc: func(obj interface{}) { | ||
node := obj.(*v1core.Node) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
prog/kube-utils/main.go
Outdated
common.Log.Fatalf("[kube-peers] Could not make Kubernetes connection: %v", err) | ||
} | ||
cml := newConfigMapAnnotations(configMapNamespace, configMapName, client) | ||
weave := weaveapi.NewClient(os.Getenv("WEAVE_HTTP_ADDR"), common.Log) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more thoughts
prog/kube-utils/main.go
Outdated
} | ||
} | ||
common.Log.Debugln("[kube-peers] Nodes deleted:", nodeObj.Name) | ||
config, err := rest.InClusterConfig() |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
prog/kube-utils/main.go
Outdated
common.Log.Fatalf("[kube-peers] Tombstone contained object that is not a Node: %#v", obj) | ||
} | ||
} | ||
common.Log.Debugln("[kube-peers] Nodes deleted:", nodeObj.Name) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
I have included On node delete now
Please take a look. |
Changes to invoke
reclaimRemovedPeers
inkube-utils
on Kubernetes node delete event