-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Decouple label propagation #1908
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If we are happy with the approach I'll sort out the tests. |
thanks Alberto, i'll take a look tomorrow or monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to restrict any of these, even as we loosen restrictions:
kubernetes.io/hostname
kubernetes.io/arch
kubernetes.io/os
node.kubernetes.io/instance-type
node.kubernetes.io/windows-build
?
That sounds reasonable. We can even make this a configurable --flag being restrictive as default if we want to. And with * allowing everything (note that in any case, now that we have decoupling, KubeletLabels will filter out everything not admitted). We are most likely moving in a similar direction in capi with the flag. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
597f443
to
c91b821
Compare
Thanks for feedback. I addressed comments and I'll look into sorting out the tests next. |
9a2e195
to
8fef8c6
Compare
Pull Request Test Coverage Report for Build 12755426911Details
💛 - Coveralls |
8fef8c6
to
860af35
Compare
addressed comments and tests, PTAL @elmiko @javanthropus @sftim |
860af35
to
d2630f5
Compare
i think this is generally looking good, but i'm curious about @sftim 's comment #1908 (review) i think it makes sense to consider preventing some of those labels, especially if there is overlap with what the ccm might manage. |
This is now implemented in the PR now via a74f2fd#diff-ae13a9bbcddb80fed52372de837e250e0cec57957b6b58bc06c98130d0aed84fR123-R129 |
d2630f5
to
28a7364
Compare
pkg/apis/v1/labels.go
Outdated
labelDomain := GetLabelDomain(key) | ||
for restrictedLabelDomain := range RestrictedLabelDomains { | ||
if labelDomain == restrictedLabelDomain || strings.HasSuffix(labelDomain, "."+restrictedLabelDomain) { | ||
return fmt.Errorf("label %s is restricted; specify a well known label: %v, or a custom label that does not use a restricted domain: %v", key, sets.List(WellKnownLabels), sets.List(RestrictedLabelDomains)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that this is the right advice. Maybe just tell people that Karpenter doesn't manage these labels on Nodes because that's left to other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This func is actually validating requirements input and not allowing those within .karpenter.sh (except for the well known CapacityTypeLabelKey). That's afaict a Karpenter design choice.
FWIW It should never return err for the nodepool/nodeclaim apis since this is validated with cel as well. It would return error for a pod with a conflicting node selector req.
So I think the current error message is actually accurate?
IsValidToSyncCentrallyLabel
is what dictates what gets propagated to Nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error message makes sense to me given the context, but i do wonder if we should disambiguate from the restricted label error that is produced in the following conditional. looking at the error message on line 117 makes me think we should alter the other message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the error in both places to clarify the conflicting label is not allowed because it might interfere with the internal provisioning logic. Let me know if that's still not clear.
if err != nil { | ||
panic(err) | ||
} | ||
kubeletLabelsAnnotation := map[string]string{apis.Group + "/node-restricted-labels": string(json)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment about why we set this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added, please let me know if that's still not clear.
return labels | ||
} | ||
|
||
// KubeletLabels filters out labels from the requirements that will be rejected by the node restriction admission. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// KubeletLabels filters out labels from the requirements that will be rejected by the node restriction admission. | |
// KubeletLabels identifies the labels that we expect the kubelet to manage | |
// Do not include labels here that the node restriction admission controller would reject. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added a variation of the above, please let me know if that's still not clear.
awesome, thanks! |
28a7364
to
b35abac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple comments
pkg/apis/v1/labels.go
Outdated
labelDomain := GetLabelDomain(key) | ||
for restrictedLabelDomain := range RestrictedLabelDomains { | ||
if labelDomain == restrictedLabelDomain || strings.HasSuffix(labelDomain, "."+restrictedLabelDomain) { | ||
return fmt.Errorf("label %s is restricted; specify a well known label: %v, or a custom label that does not use a restricted domain: %v", key, sets.List(WellKnownLabels), sets.List(RestrictedLabelDomains)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error message makes sense to me given the context, but i do wonder if we should disambiguate from the restricted label error that is produced in the following conditional. looking at the error message on line 117 makes me think we should alter the other message.
fe07959
to
2fa417c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the new error text. i had thought we might want to differentiate those 2 error conditions, but thinking about it i'm not sure what would be a meaningful distinction.
@sftim @elmiko @javanthropus I think I addressed most comments. Please Let me know if there's more feedback |
looking good to me |
nodeClaim Labels are the source for core karpenter to sync Node Labels in a centralized fashion. Some bootstrap userdata provider implementations also consume this labels and pass them through kubelet self setting. That results in a coupling between a centralized and a kubelet self setting approach. This coupling results in conflicting criteria for validations and degraded UX. This PR removes the coupling. As a consequence, Node Labels are not unncessarily restricted for the centralized sync anymore. This better empowers administrators reducing the reliance on self setting kubelet and minimizing the risk of a Node steering privileged workloads to itself. A subset of labels, excluding those disallowed by the node restriction admission, is now stored in json within the "karpenter.sh/node-restricted-labels" NodeClaim annotation. This enables bootstrap userdata providers to continue utilizing the labels if needed.
2fa417c
to
1bf5f15
Compare
/assign @elmiko |
Fixes #1046
Description
nodeClaim Labels are the source for core karpenter to sync Node Labels in a centralized fashion.
Some bootstrap userdata provider implementations also consume these labels and pass them through kubelet self setting.
That results in a coupling between a centralized and a kubelet self setting approach.
This coupling results in conflicting criteria for validations and degraded UX.
This PR removes the coupling.
As a consequence, Node Labels are not unncessarily restricted for the centralized sync anymore.
This better empowers administrators reducing the reliance on self setting kubelet and minimizing the risk of a Node steering privileged workloads to itself.
A subset of labels, excluding those disallowed by the node restriction admission, is now stored in json within the "karpenter.sh/node-restricted-labels" NodeClaim annotation. This enables bootstrap userdata providers to continue utilizing the labels if needed.
How was this change tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.