-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhook didn't catch an error in spec.blockDeviceMappings[*].ebs settings #3409
Comments
We should remove Deletion webhooks using
We should also fix this panic. |
Trying to reproduce this. Getting a hang instead of a crash
Oddly, the instance does launch -- but why does it take so long? |
Ah -- was blocked on the pending snapshot
|
Hey @andrescaroc ,
|
Noticing you're running on |
Ah -- looks like this is a release issue. |
No local build of Karpenter, I am using the helm chart of v0.24 (I think is the latest official release at the time of writing) |
I cut #3414 for this |
@andrescaroc can you reproduce this 100% of the time with your instructions? I am unable to. |
Labeled for closure due to inactivity in 10 days. |
@ellistarn I think this is still an issue: Today I tried again to use a snapshot without defining the volume size and karpenter crashes getting unrecoverable. Karpenter version: 0.27.1 Logs:
AWSNodeTemplate:
Karpenter gets in an unrecoverable state, trying to fix the AWSNodeTemplate resource hit this error.
I have to use |
Can you confirm what @ellistarn had asked previously? Seems like he had an issue reproducing. |
Taking in consideration that is the second time that I try without setting the |
Ah yeah, looks like I'm getting the same issue. I'll re-open:
This is my
Looks like the issue is here: https://github.com/aws/karpenter/blob/main/pkg/providers/instancetype/types.go#L150 and then in https://github.com/aws/karpenter/blob/main/pkg/providers/instancetype/types.go#L176 here. When I changed my amiFamily to use AL2, I didn't see the crash, but I did still get an error:
Going to re-open so we can fix this. |
It looks like for EBS root volumes: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html
And our validation logic just checks if one of
|
Thanks @njtran, I think you catched the error. In my case I use bottlerocketOS which by default comes with two volumes (root and data), I want to use a snapshot for the data volume Based on the aws documentation I should be able to provide an EBS Block device mapping without having to specify the volume size. |
@njtran about the error that you are getting for the snapshot itself, I will suggest you to test with a snapshot of your own, I used a random SnapshotID in my description. A snapshot of the data volume of a Bottlerocket OS instance will do the job. Otherwise you may loose time trying to debug something not related with the main issue. |
Version
Karpenter Version: v0.24.0
Kubernetes Version: v1.23.0
Expected Behavior
If there is an error in the settings of an
AWSNodeTemplate
it should be cached by the karpenter validating webhook.In my case It seems karpenter is forcing me to define an
ebs.volumeSize
when I already defined anebs.snapshotID
.However, according the aws API documentation is only required one of the two, and is possible to define both under a condition:
Actual Behavior
I was trying to deploy a Provisioner refering to an
AWSNodeTemplate
with missing required fields (I was not aware of) in theblockDeviceMappings
section , the validating webhook let it pass and be deployed.Right away the karpenter pods start to crash letting the karpenter useless.
karpenter pods watch:
Steps to Reproduce the Problem
Define a provisioner that refers to an AWSNodeTemplate as follows:
Define an AWSNodeTemplate that will use a snapshot as follows:
I am going to focus in the blockDeviceMappings section that was my case, but might be happening in other nested sections under
spec
Deploy both resources:
No complains shown by the karpenter webhook.
Watch the karpenter pods:
Karpenter pods shoud start to restart and eventually crash both
You will not be able to do any other operation like editing karpenter resources (the node template) or delete it because the karpenter webhook won't be available.
To solve it you need to rollout restart the karpenter deployment and patch the faulty AWSNodeTemplate right away, previous they start to crash again.
Resource Specs and Logs
Logs were not useful to find the issue:
Community Note
The text was updated successfully, but these errors were encountered: