Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

andrewd-zededa · 2024-12-23T23:08:37Z

As a part of kubevirt-eve we have multiple cluster nodes each hosting app workloads and volume replicas.
This implements defer for eve mgmt config operations which will result in unavailability of storage
replicas until the cluster volume is not running on a single replica.

An example:

Node 1 outage and recovers.
Before volumes complete rebuilding on node 1 there is a node 2 outage and recovery.
Volumes begin rebuilding replicas on nodes 1 and 2. Only available rebuild source is on node 3.
User initiated request to reboot/shutdown/update eve-os on node 3.
That config request is set to defer until replicas are rebuilt on the other nodes.

At a high level the eve-side workflow looks like this:

eve config received requesting reboot/shutdown/baseos-image-change to node 1
drain requested for node 1
zedkube cordons node 1 so that new workloads are blocked from scheduling on that node.
zedkube initiates a kubernetes drain of that node removing workloads
As a part of drain, PDB (Pod Disruption Budget) at longhorn level determines local replica is the last online one.
Drain waits for volume replicas to rebuild across the cluster.
Drain completes and NodeDrainStatus message sent to continue original config request.
On the next boot event zedkube nodeOnBootHealthStatusWatcher() waits until the local kubernetes node comes online/ready for the first time on each boot event and uncordons it, allowing workloads to be scheduled.

Note: For eve baseos image updates this path waits until a new baseos image is fully available locally (LOADED or INSTALLED) and activated before beginning drain.

This PR implements original api added in #4366.

andrewd-zededa · 2024-12-23T23:12:30Z

  ./out has been created
  Modes: GitHubActions Robot InContainer ResetRepo UnitTests
  Processing: GH:4494
  GITHUB PR #4494 is being downloaded from
  https://api.github.com/repos/lf-edge/eve/pulls/4494
    JSON data at Mon Dec 23 11:09:31 PM UTC 2024
    Patch data at Mon Dec 23 11:09:32 PM UTC 2024
  ERROR: Unsure how to process GH:4494. Permissions missing?

An odd yetus failure, not sure what to make of this.

As a part of kubevirt-eve we have multiple cluster nodes each hosting app workloads and volume replicas. This implements defer for eve mgmt config operations which will result in unavailability of storage replicas until the cluster volume is not running on a single replica. An example: 1. Node 1 outage and recovers. 2. Before volumes complete rebuilding on node 1 there is a node 2 outage and recovery. 3. Volumes begin rebuilding replicas on nodes 1 and 2. Only available rebuild source is on node 3. 4. User initiated request to reboot/shutdown/update eve-os on node 3. 5. That config request is set to defer until replicas are rebuilt on the other nodes. At a high level the eve-side workflow looks like this: 1. eve config received requesting reboot/shutdown/baseos-image-change to node 1 2. drain requested for node 1 3. zedkube cordons node 1 so that new workloads are blocked from scheduling on that node. 4. zedkube initiates a kubernetes drain of that node removing workloads 5. As a part of drain, PDB (Pod Disruption Budget) at longhorn level determines local replica is the last online one. 6. Drain waits for volume replicas to rebuild across the cluster. 7. Drain completes and NodeDrainStatus message sent to continue original config request. 8. On the next boot event zedkube nodeOnBootHealthStatusWatcher() waits until the local kubernetes node comes online/ready for the first time on each boot event and uncordons it, allowing workloads to be scheduled. Note: For eve baseos image updates this path waits until a new baseos image is fully available locally (LOADED or INSTALLED) and activated before beginning drain. Signed-off-by: Andrew Durbin <[email protected]>

Main dependencies listed here, others pulled in for version conflicts. k8s.io/kubectl/pkg/drain github.com/longhorn/longhorn-manager Signed-off-by: Andrew Durbin <[email protected]>

andrewd-zededa · 2025-01-20T16:15:21Z

yetus checking seems broken:

2025-01-18T02:42:21.7383936Z ./out has been created
2025-01-18T02:42:22.0153173Z Modes: GitHubActions Robot InContainer ResetRepo UnitTests
2025-01-18T02:42:22.0153688Z Processing: GH:4494
2025-01-18T02:42:22.0158046Z GITHUB PR #4494 is being downloaded from
2025-01-18T02:42:22.0158544Z https://api.github.com/repos/lf-edge/eve/pulls/4494
2025-01-18T02:42:22.0172328Z JSON data at Sat Jan 18 02:42:22 AM UTC 2025
2025-01-18T02:42:22.5937865Z Patch data at Sat Jan 18 02:42:22 AM UTC 2025
2025-01-18T02:42:23.0465375Z ERROR: Unsure how to process GH:4494. Permissions missing?

eriknordmark · 2025-01-20T22:08:58Z

pkg/pillar/cmd/baseosmgr/baseosmgr.go

@@ -136,6 +139,8 @@ func Run(ps *pubsub.PubSub, loggerArg *logrus.Logger, logArg *base.LogObject, ar
 	}
 	log.Functionf("user containerd ready")

+	initNodeDrainPubSub(ps, &ctx)


The style in this file is to call some init functions around line 104 so it makes sense to follow that pattern for this additional description.

eriknordmark · 2025-01-20T22:09:24Z

pkg/pillar/cmd/baseosmgr/handlebaseos.go

@@ -32,6 +32,17 @@ func baseOsHandleStatusUpdateUUID(ctx *baseOsMgrContext, id string) {
 		return
 	}

+	// We want to wait to drain until we're sure we actually have a usable image local.


s/local/locally/ ?

eriknordmark · 2025-01-20T22:14:52Z

pkg/pillar/cmd/baseosmgr/handlenodedrain.go

+		})
+	if err != nil {
+		log.Fatalf("initNodeDrainPubSub pubNodeDrainRequest err:%v", err)
+		return


Misplaced return?

eriknordmark · 2025-01-20T22:18:36Z

pkg/pillar/cmd/baseosmgr/handlenodedrain.go

+	if drainStatus.Status == kubeapi.NOTREQUESTED ||
+		drainStatus.Status == kubeapi.FAILEDCORDON ||
+		drainStatus.Status == kubeapi.FAILEDDRAIN {
+		ctx.deferredBaseOsID = id


Since drain can take a while there is a possibility that the user changes their mind about which new EVE version to use. Does the logic handle that by updating the deferredBaseOsID the user/API provides a new one while the update is deferred?

eriknordmark · 2025-01-20T22:24:48Z

pkg/pillar/cmd/diag/handlenodedrain.go

+	ctx.ph.Print("INFO: Node Drain -> %s at %v\n", newStatus.Status.String(), ts)
+	ctx.ph.Flush()


This doesn't follow the pattern for printing in diag.
Where there is an update to something which is printed, then the handlers should call triggerPrintOutput and the printOutput is where the actual information is printed.

Also, if the Status indicates some failure, wouldn't you want to print that as an with ERR prefix?

eriknordmark · 2025-01-20T22:26:25Z

pkg/pillar/cmd/nodeagent/handlenodedrain.go

+	"github.com/lf-edge/eve/pkg/pillar/pubsub"
+)
+
+func handleNodeDrainStatusCreateNA(ctxArg interface{}, key string,


Why the "NA" suffix in the function names? We reuse the names across all of the agents so it is odd to have one agent add the agentname to some (but not all) of these functions.

eriknordmark · 2025-01-20T22:27:27Z

pkg/pillar/cmd/baseosmgr/handlenodedrain.go

+	configArg interface{}, _ interface{}) {
+	newStatus, ok := configArg.(kubeapi.NodeDrainStatus)
+	if !ok {
+		log.Errorf("handleNodeDrainStatusImpl invalid type in configArg: %v", configArg)


In other agents you log.Fatal for this condition.

eriknordmark · 2025-01-20T22:28:01Z

pkg/pillar/cmd/zedagent/handlenodedrain.go

+	configArg interface{}, oldConfigArg interface{}) {
+	newStatus, ok := configArg.(kubeapi.NodeDrainStatus)
+	if !ok {
+		log.Errorf("handleNodeDrainStatusImpl invalid type in configArg: %v", configArg)


eriknordmark · 2025-01-20T22:33:25Z

pkg/pillar/cmd/zedagent/handleconfig.go

@@ -77,6 +77,7 @@ type getconfigContext struct {
 	configGetStatus           types.ConfigGetStatus
 	updateInprogress          bool
 	readSavedConfig           bool // Did we already read it?
+	drainInProgress           bool


The code reads as if this field has the semantics of waitForDrainComplete (or deferForDrain). Is that the semantics in zedagent?

eriknordmark · 2025-01-20T22:37:02Z

pkg/pillar/cmd/zedagent/zedagent.go

+	if ctx.poweroffCmdDeferred &&
+		drainInProgress && !status.DrainInProgress {
+		log.Functionf("Drain complete and deferred poweroff")
+		ctx.poweroffCmdDeferred = false


I don't see the code which sets poweroffCmdDeferred due to a drain.
Did you miss a function which sets the various deferred fields due to the local profile server API being used? (I don't recall whether that is a separate function from the controller API.)

eriknordmark · 2025-01-20T22:42:13Z

pkg/pillar/cmd/zedkube/zedkube.go

+		if newConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout) != 0 &&
+			newConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout) !=
+				currentConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout) {
+			log.Functionf("handleGlobalConfigImpl: Updating drainTimeoutHours from %d to %d",
+				currentConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout),
+				newConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout))
+			z.drainTimeoutHours = newConfigItemValueMap.GlobalValueInt(types.KubevirtDrainTimeout)
+		}
+	}


This code would be readable if you put the current and new in some local variables instead of these very long lines of repeated calls to GobalValueInt.

eriknordmark · 2025-01-20T22:47:51Z

pkg/pillar/cmd/nodeagent/handlenodedrain.go

+	}
+
+	log.Noticef("handleNodeDrainStatusImplNA to:%v", newStatus)
+	// NodeDrainStatus Failures here should keep drainInProgress set.


If there is something catastrophic happening to a cluster (e.g., two out of three nodes die due to hardware failures), is there a way to recover from that and make the single remaining node be usable e.g., as a one node cluster or as a standalone device?

It seems like a reboot might not be usable to recover since (based on this comment) you can't reboot until the drain has succeeded, and that is presumably impossible if the two other nodes died while you were trying to drain.
What happens when the drain timer fires?

Is there a recovery case which does not involve a truck roll and a reinstall of EVE in such a case?

eriknordmark

In addition to the detailed comments, I think the text in the PR description should go in a markdown file.

Also, at a high-level what are the pre-conditions for the reboot/shutdown/update?
Is it that all three nodes being healthy?
(That would ensure that if there is a hardware failure or crash of another node while this one is rebooting/updating that there is still one node left.)

If while the drain is in progress one of the other nodes in the cluster dies, should the reboot/update be deferred until that node is back?

eriknordmark · 2025-01-20T22:53:42Z

pkg/pillar/cmd/zedkube/zedkube.go

+			if override != nil {
+				zedkubeCtx.pubNodeDrainStatus.Publish("global", override)
+			}
+			zedkubeCtx.drainOverrideTimer.Reset(5 * time.Minute)


Shouldn't you Stop the timer here as well like in the initial case?

eriknordmark · 2025-01-20T22:57:30Z

pkg/pillar/cmd/diag/handlenodedrain.go

+		// Just print the transitions which are linked to lengthy operations or errors
+		return
+	}
+	ctx.ph.Print("INFO: Node Drain -> %s at %v\n", newStatus.Status.String(), ts)


Should this also go in the new console TUI? Check with @rucoder

github-actions bot requested review from deitch, eriknordmark, milan-zededa, OhmSpectator, rouming and uncleDecart December 23, 2024 23:14

andrewd-zededa force-pushed the drain-node branch from 3c4957b to 9caa356 Compare December 23, 2024 23:18

andrewd-zededa force-pushed the drain-node branch 3 times, most recently from 465e6a0 to e3d900e Compare January 2, 2025 19:39

andrewd-zededa mentioned this pull request Jan 3, 2025

Publish ZInfoKubeCluster and ZInfoKubeClusterUpdateStatus #4507

Draft

andrewd-zededa force-pushed the drain-node branch 3 times, most recently from 552119f to 9cce3d4 Compare January 18, 2025 00:39

andrewd-zededa added 2 commits January 17, 2025 19:39

Updated go mod deps for drain and longhorn

9c6952e

Main dependencies listed here, others pulled in for version conflicts. k8s.io/kubectl/pkg/drain github.com/longhorn/longhorn-manager Signed-off-by: Andrew Durbin <[email protected]>

andrewd-zededa force-pushed the drain-node branch from 9cce3d4 to 9c6952e Compare January 18, 2025 02:41

andrewd-zededa marked this pull request as ready for review January 20, 2025 16:15

eriknordmark reviewed Jan 20, 2025

View reviewed changes

eriknordmark requested changes Jan 20, 2025

View reviewed changes

eriknordmark reviewed Jan 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

andrewd-zededa commented Dec 23, 2024 •

edited

Loading

andrewd-zededa commented Dec 23, 2024

andrewd-zededa commented Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

eriknordmark left a comment

eriknordmark Jan 20, 2025

eriknordmark Jan 20, 2025

		ctx.ph.Print("INFO: Node Drain -> %s at %v\n", newStatus.Status.String(), ts)
		ctx.ph.Flush()

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Are you sure you want to change the base?

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Conversation

andrewd-zededa commented Dec 23, 2024 • edited Loading

andrewd-zededa commented Dec 23, 2024

andrewd-zededa commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eriknordmark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewd-zededa commented Dec 23, 2024 •

edited

Loading