failover feature-gate Cannot be closed correctly #5375

kubepopeye · 2024-08-15T03:00:42Z

Please provide an in-depth description of the question you have:

I don't want karmada to trigger a failover when the cluster is unreachable, I tried to disable the feature-gate directly in karmada-controller and found that the failover still occurs!

What do you think about this question?:
I went to look at the karmada implementation, the cluster-controller does add a judgement for failover feature-gate in the monitor place, but in the ttaintClusterByCondition method, there is a lack of judgement, which leads to the taint being hit, and ultimately leads to feature-gate failover feature-gate.

Environment:

Karmada version:
v1.8.0
Kubernetes version:
1.25
Others:

The text was updated successfully, but these errors were encountered:

whitewindmills · 2024-08-15T06:19:42Z

taintClusterByCondition only adds NoSchedule taints, which only affect scheduling.

kubepopeye · 2024-08-15T07:10:17Z

企业微信截图_ca74a2d1-ef4f-467e-8c83-00554c41388d

kubepopeye · 2024-08-15T07:11:27Z

So what's causing this problem and can you help answer? It's true that it's noSchedule. but it's still triggering the clearing of the orphaned work.

@whitewindmills

whitewindmills · 2024-08-15T07:25:52Z

the orphan works may be caused by multiple reasons. I cannot find the root cause by those comments. can you paste scheduler logs here?

kubepopeye · 2024-08-15T08:24:47Z

�scheduler logs？ I found that it seems to remove the resourcebind spec cluster only in the case of taint, which ends up causing findOrphan to be able to be found there.

whitewindmills · 2024-08-16T05:52:37Z

since you have disabled the failover feature, but karmada-scheduler might change their schduling result.

kubepopeye · 2024-08-16T09:26:06Z

since you have disabled the failover feature, but karmada-scheduler might change their schduling result.

Is this the expected correct behaviour, I've found that in some cases it can lead to an empty list of Cluster's APIEnablements, which ends up with disastrous consequences!

whitewindmills · 2024-08-16T09:30:45Z

no, but we're fixing it.
ref:
#5325
#5216

whitewindmills · 2024-08-16T09:32:51Z

I've found that in some cases it can lead to an empty list of Cluster's APIEnablements, which ends up with disastrous consequences!

have u ensured that's your root cause?

kubepopeye · 2024-08-16T09:45:50Z

I've found that in some cases it can lead to an empty list of Cluster's APIEnablements, which ends up with disastrous consequences!

have u ensured that's your root cause?

Yes, it seems to me that the shutdown of failover means that no migration of availability zones should take place, but it seems that there are some features here that cause failover-like behaviour to take place nonetheless.

kubepopeye · 2024-08-16T09:50:31Z

I've found that in some cases it can lead to an empty list of Cluster's APIEnablements, which ends up with disastrous consequences!

have u ensured that's your root cause?

If we are unlucky, the cluster-status-controller will clear the apiEnablements in the cluster status when the cluster goes offline, then the scheduler will step in and find no matching APIs, which in turn will cause rb's specification cluster to be cleared, and finally the binding controller will removeOrphan, causing the Our downstream resources are removed. The binding controller's removeOrphan causes our downstream resources to be deleted. This is the complete chain, so we still consider the failover implementation to be incomplete.

whitewindmills · 2024-08-16T10:09:35Z

first, there's nothing to do with FAILOVER.
did you see your cluster failing? it's important, wrong APIENABLEMENTS are usually caused by network errors or failing APIService.

todo:
grep such logs in your karmada-controller-manager.

Failed to get any APIs installed in Cluster
Maybe get partial
if you find them, that's the case.

kubepopeye · 2024-08-16T10:45:22Z

Failed to get any APIs installed in Cluster

yes,Failed to get any APIs installed in Cluster

企业微信截图_b69678f0-cbb8-4242-9a80-b414da0d8da9

XiShanYongYe-Chang · 2024-12-31T04:03:25Z

Hi @kubepopeye, thanks for your response,

According to the log information, your analysis is correct, and we noticed this problem and fixed it in v1.12, just as @whitewindmills said, in terms of karmada-controller-manager, we added CompleteAPIEnablements for cluster status. On the scheduler side, we handled the cluster CompleteAPIEnablements Condition.

Now this problem should have been fixed, can you help make sure?

kubepopeye added the kind/question Indicates an issue that is a support question. label Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failover feature-gate Cannot be closed correctly #5375

failover feature-gate Cannot be closed correctly #5375

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024

whitewindmills commented Aug 16, 2024

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024

kubepopeye commented Aug 16, 2024 •

edited

Loading

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024 •

edited

Loading

XiShanYongYe-Chang commented Dec 31, 2024

failover feature-gate Cannot be closed correctly #5375

failover feature-gate Cannot be closed correctly #5375

Comments

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 15, 2024

kubepopeye commented Aug 15, 2024

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024

whitewindmills commented Aug 16, 2024

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024

kubepopeye commented Aug 16, 2024 • edited Loading

whitewindmills commented Aug 16, 2024

kubepopeye commented Aug 16, 2024 • edited Loading

XiShanYongYe-Chang commented Dec 31, 2024

kubepopeye commented Aug 16, 2024 •

edited

Loading

kubepopeye commented Aug 16, 2024 •

edited

Loading