Ensure that DeleteMachine
call is made even in case of a NotFound
error from Driver.GetMachineStatus
in the deletion flow
#936
Labels
area/control-plane
Control plane related
area/robustness
Robustness, reliability, resilience related
kind/enhancement
Enhancement, improvement, extension
priority/2
Priority (lower number equals higher priority)
status/closed
Issue is closed (either delivered or triaged)
How to categorize this issue?
/area control-plane
/area robustness
/kind enhancement
/priority 2
What would you like to be added:
Remove
getVMStatus
fromtriggerDeletionFlow
. The consequence is that no other step in the deletion flow will be skipped and the call toDriver.DeleteMachine
will be made. This will ensure that there are no orphan resources left and we don't have to rely on the orphan collection logic of MCM.Why is this needed:
In azure, the creation of VM and NIC is done separately (it cannot be done together as the cloud provider does not have this functionality). In this case, the NIC gets created but the VM does not
Now, after 20 minutes MCM marks this machine as Failed
In the deletion flow, we do a GetMachineStatus call (See
machine-controller-manager/pkg/util/provider/machinecontroller/machine_util.go
Line 955 in 724056d
In this case, the orphan collection logic of MCM comes into the picture and tries to remove the NIC.
MCM got terminated at 2024-08-08 06:08:41 so essentially the orphan collector got only ~2 mins which was insufficient.
Hence we should remove the
getVMStatus
check in the deletion flow and always call theDriver.DeleteMachine
methodThe text was updated successfully, but these errors were encountered: