-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup Dependency No CSINode Data #1030
Comments
/assign |
I was able to reproduce this by following https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html I'm noticing that my node has
I'm curious why the EBS CSI driver doesn't rely on the standard labels
and standard instance type
We should be able to avoid this race condition by using those properties of the node, which are set correctly by the kubelet on startup. Minimally, it should probably just retry so the pod doesn't get stuck in container creating:
|
Do you have logs from the controller (its ebs-plugin container or csi-provisioner container)? This bugfix looks kind of suspect but I'm not sure if it would fix this issue, kubernetes-csi/external-provisioner#617, can you try the more recent external-provisioner:v2.2.2 which includes this bugfix and see if it works? (by replacing https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L16) The logic for watching csinode objects and retrying is handled by external-provisioner so I strongly suspect the issue is that we are bundling a version of external-provisioner with a bug, if not the one i linked, in it. The controller side of the driver is otherwise ignorant of kubernetes csi api objects like csinode.
the topology labeling is an open issue. At the moment it's simplest for all CSI drivers to use their own labels which predate the v1 standard labels. Otherwise things get complicated in situations where CSI migration is toggled on and off (translation lib expects to find CSI label A on CSI volumes and standard label B on in-tree volumes) or in driver upgrade downgrade scenarios (PV provisioned with version A has label A and then can't get scheduled anywhere after I upgrade to driver where nodes have standard label B). Ref: #962.
Our driver uses providerID to populate the csinode nodeid in cases where instance metadata is unavailable. The csi.volume.kubernetes.io/nodeid annotation is deprecated and made redundant by the csinode nodeid field so it can be ignored: https://github.com/kubernetes/community/pull/2034/files#diff-eab491c51f66885ca6fa1f76254d53d01c39e09dca6939ba890cdfdeaac21fe0R106. |
Thanks @wongma7. I've made some progress with v2.2.2, though I'm not sure if it resolves the issue in the original bug report (I haven't tried 1.2.0). I don't have particularly interesting logic, but I may no be looking at the right component -- not deep here. There are two cases:
Case 1:This appears to be working as long as the pod is scheduled in the same zone as the volume.
Success!
Case 2No PV is created, and the PVC is stuck in pending:
The pod is stuck pending
|
How does a PV get allocated a zone at creation time? Is the difference between case 1 and case 2 the Does (and should) the scheduler set annSelectedNode on PVCs after their respective pods get scheduled? https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1150-L1159 This is just from reading code but I think CSI drivers rely on that annotation to be set by scheduler. I don't think sig-storage-lib-external-provisioner has accounted for custom schedulers. Otherwise, the first time the driver (specifically the external-provisioner sidecar running sig-storage-lib-external-provisioner) "sees" the PVC it decides there is nothing to do with it (since it doesn't know in what zone the PV needs to be created). And then the PVC is never updated again so the driver doesn't receive any update events so the PVC sits idle. Logs from the driver won't show any of this. |
Formatted for clarity: the expected flow for A) default scheduler + B) in-tree PV controller is + C) external-provisioner/sig-storage-lib-external-provisioner is:
If the scheduler doesn't set annSelectedNode and the StorageClass is WaitForFirstConsumer then the external-provisioner will never provision a volume. (this is from reading the code so might be wrong, this contract between A/B/C seems to be undocumented, it predates KEPs I think. ) |
Hey @wongma7, is the above true re |
Yes, scheduler should reset the annotation after the pod has been rescheduled because the "volume binding plugin" should run again https://github.com/kubernetes/kubernetes/blob/39c76ba2edeadb84a115cc3fbd9204a2177f1c28/pkg/scheduler/framework/plugins/volumebinding/binder.go#L419 |
I have an implementation here that I think is working. Any chance you can review? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
This is resolved. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What happened?
We recently started using karpenter to provision nodes. Since Karpenter overrides the built in scheduler it is able to frontload and startup pods that were pending schedule first. Other necessary pods like ebs-csi-node and aws-node arrive about a minute later. We believe it is during this gap that the controller tries to read csiNode for topology properties which it doesn't have at that moment in time, being that the startup order is somewhat delayed. I don't believe it retries because the information is eventually there, looking for some validation?
What you expected to happen?
Controller retry to provision volume if csiNode information not available on first attempt.
How to reproduce it (as minimally and precisely as possible)?
Remove csiNode properties that belong to a node and launch a pod that needs a volume.
Anything else we need to know?:
We limit imdsv2 access to hostNetworking only, as far as we can tell the driver uses the kube-api and it appears it is setting the appropriate information on the csiNode CR, just not when the controller expects it.
Environment
kubectl version
): 1.20The text was updated successfully, but these errors were encountered: