diff --git a/keps/prod-readiness/sig-network/3705.yaml b/keps/prod-readiness/sig-network/3705.yaml index 6e6e5a5e6b97..842890969903 100644 --- a/keps/prod-readiness/sig-network/3705.yaml +++ b/keps/prod-readiness/sig-network/3705.yaml @@ -4,3 +4,5 @@ kep-number: 3705 alpha: approver: "wojtek-t" +beta: + approver: "wojtek-t" diff --git a/keps/sig-network/3705-cloud-node-ips/README.md b/keps/sig-network/3705-cloud-node-ips/README.md index 8f92484c8519..1d1017f87e1a 100644 --- a/keps/sig-network/3705-cloud-node-ips/README.md +++ b/keps/sig-network/3705-cloud-node-ips/README.md @@ -14,18 +14,15 @@ - [Changes to the provided-node-ip annotation](#changes-to-the--annotation) - [Changes to cloud providers](#changes-to-cloud-providers) - [Example of --node-ip possibilities](#example-of--possibilities) - - [UPDATE: Late-breaking change for 1.27...](#update-late-breaking-change-for-127) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit tests](#unit-tests) - [Integration tests](#integration-tests) - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) - - [Alpha (1?)](#alpha-1) - - [Alpha 2?](#alpha-2) + - [Alpha](#alpha) - [Beta](#beta) - [GA](#ga) - - [Deprecation](#deprecation) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) @@ -75,34 +72,14 @@ using a cloud provider. This KEP proposes to fix that. ### Goals - Allow administrators of clusters using external cloud providers to - override the cloud-selected node IPs in a more dual-stack-aware - manner: + override both node IPs on a node in a dual-stack cluster. - - Allow administrators to override a single node IP on a node in a - dual-stack cluster without causing the node to become - single-stack. - - - Allow administrators to override both node IPs on a node in a - dual-stack cluster. - - - Allow administrators to override the order of node IPs on a node - in a dual-stack cluster (ie, requesting that nodes be - IPv6-primary dual stack rather than IPv4-primary dual stack, or - vice versa). - - - Allow administrators to configure nodes to be single-stack when - the cloud provider suggests dual-stack, without needing to - provide a specific IPv4 or IPv6 node IP. - -- Define how kubelet will communicate these new intents to cloud +- Define how kubelet will communicate this new intent to cloud providers. - Update the code in `k8s.io/cloud-provider/node/helpers` to implement the needed algorithms for the new behaviors. -- Rename the `alpha.kubernetes.io/provided-node-ip` annotation, which - has not been alpha for a long time. - ### Non-Goals - Changing the behavior of nodes using legacy cloud providers. @@ -120,15 +97,15 @@ using a cloud provider. This KEP proposes to fix that. dual-stack nodes in clouds that do not themselves support dual-stack.) -- Improving the behavior of autodetecting the node IP in any other - ways. (Eg, being able to pass a CIDR rather than an IP.) There was - some discussion of ways we might do this in the abandoned [KEP-1664] - (which this KEP is a spiritual successor to), but if we want that, - it can be done later, independently of these changes. (These changes - were mostly desired in the context of non-cloud-provider clusters - anyway.) +- Improving node IP handling in any other ways. The original version + of this KEP proposed some other `--node-ip` features to help with + single-stack IPv6 and dual-stack clusters but they turned out to be + insufficient and would require a larger redesign of node IP handling + in order to fix. -[KEP-1664]: https://github.com/kubernetes/enhancements/issues/1664 +- Renaming `alpha.kubernetes.io/provided-node-ip` annotation. This was + also proposed in the original version of this KEP, but is no longer + planned as part of this feature. ## Proposal @@ -169,35 +146,12 @@ specify both IPs that you want it to use. ### Changes to `--node-ip` -The most obvious required change is that we need to allow -comma-separated dual-stack `--node-ip` values in clusters using -external cloud providers (but _not_ in clusters using legacy cloud -providers). - -Additionally, the fact that kubelet does not currently pass -"`0.0.0.0`" and "`::`" to the cloud provider creates a compatibility -problem: we would like for administrators to be able to say "use an -IPv6 node IP but I don't care which one" in cloud-provider clusters -like they can in non-cloud-provider clusters, but for compatibility -reasons, we can't change the existing behavior of "`--cloud-provider -external --node-ip ::`" (which doesn't do what it's "supposed to", but -does have possibly-useful side effects that have led some users to use -it anyway; see [kubernetes #111695].) - -So instead, we will add new syntax, and allow administrators to say -"`--node-ip IPv4`" or "`--node-ip IPv6`" if they want to explicitly -require that the cloud provider pick a node IP of a specific family. -(This also improves on the behavior of the existing "`0.0.0.0`" and -"`::`" options, because you almost never actually want the "or fall -back to the other family if there are no IPs of this family" behavior -that "`0.0.0.0`" and "`::`" imply.) - -Additionally, we will update the code to allow including "`IPv4`" and -"`IPv6`" in dual-stack `--node-ip` values as well (in both cloud and -non-cloud clusters). This code will have to check the status of the -feature gate until the feature is GA. +We will allow comma-separated dual-stack `--node-ip` values in +clusters using external cloud providers (but _not_ in clusters using +legacy cloud providers). -[kubernetes #111695]: https://github.com/kubernetes/kubernetes/issues/111695 +No other changes to `--node-ip` handling are being made as part of +this KEP. ### Changes to the `provided-node-ip` annotation @@ -208,55 +162,19 @@ annotation, and the cloud provider will see it, realize that the node IP request can't be fulfilled, log an error, and leave the node in the tainted state. -It makes sense to have the same behavior if the user passes a "new" -(eg, dual-stack) `--node-ip` value to kubelet, but the cloud provider -does not recognize the new syntax and thus can't fulfil the request. +It makes sense to have the same behavior if the user passes a +dual-stack `--node-ip` value to kubelet, but the cloud provider does +not recognize the new syntax and thus can't fulfil the request. Conveniently, we can do this just by passing the dual-stack `--node-ip` value in the existing annotation; the old cloud provider will try to parse it as a single IP address, fail, log an error, and leave the node in the tainted state, which is exactly what we wanted it to do if it can't interpret the `--node-ip` value correctly. -Therefore, we do not _need_ a new annotation for the new `--node-ip` +Therefore, we do not need a new annotation for the new `--node-ip` values; we can continue to use the existing annotation, assuming existing cloud providers will treat unrecognized values as errors. -That said, the existing annotation name is -`alpha.kubernetes.io/provided-node-ip` but it hasn't been "alpha" for -a long time. We should fix this. So: - - 1. When `--node-ip` is unset, kubelet should delete both the legacy - `alpha.kubernetes.io/provided-node-ip` annotation and the new - `kubernetes.io/provided-node-ip` annotation (regardless of - whether the feature gate is enabled or not, to avoid problems - with rollback and skew). - - 2. When the `CloudDualStackNodeIPs` feature is enabled and `--node-ip` is - set, kubelet will set both the legacy annotation and the new - annotation. (It always sets them both to the same value, even if - that's a value that old cloud providers won't understand). - - 2. When the `CloudDualStackNodeIPs` feature is enabled, the cloud provider - will use the new `kubernetes.io/provided-node-ip` annotation _if - the legacy alpha annotation is not set_. (But if both annotations - are set, it will prefer the legacy annotation, so as to handle - rollbacks correctly.) - - 3. A few releases after GA, kubelet can stop setting the legacy - annotation, and switch to unconditionally deleting it, and - setting/deleting the new annotation depending on whether - `--node-ip` was set or not. Cloud providers will also switch to - only using the new annotation, and perhaps logging a warning if - they see a node with the old annotation but not the new - annotation. - -Kubelet will preserve the existing behavior of _not_ passing -"`0.0.0.0`" or "`::`" to the cloud provider, even via the new -annotation. This is needed to preserve backward compatibility with -current behavior in clusters using those `--node-ip` values. However, -it _will_ pass "`IPv4`" and/or "`IPv6`" if they are passed as the -`--node-ip`. - ### Changes to cloud providers Assuming that all cloud providers use the `"k8s.io/cloud-provider"` @@ -281,11 +199,6 @@ Assume a node where the cloud has assigned the IPs `1.2.3.4`, | `9.10.11.12` | no | `"9.10.11.12"` | (error, because the requested IP is not available) | | `abcd::5678` | no | `"abcd::5678"` | `["abcd::5678"]` (SS IPv6) | | `1.2.3.4,abcd::1234` | yes* | `"1.2.3.4,abcd::1234"` | `["1.2.3.4", "abcd::1234"]` (DS IPv4-primary) | -| `IPv4` | yes | `"IPv4"` | `["1.2.3.4"]` (SS IPv4) | -| `IPv6` | yes | `"IPv6"` | `["abcd::1234"]` (SS IPv6) | -| `IPv4,IPv6` | yes | `"IPv4,IPv6"` | `["1.2.3.4", "abcd::1234"]` (DS IPv4-primary) | -| `IPv6,5.6.7.8` | yes | `"IPv6,5.6.7.8"` | `["abcd::1234", "5.6.7.8"]` (DS IPv6-primary) | -| `IPv4,abcd::ef01` | yes | `"IPv4,abcd::ef01"` | (error, because the requested IPv6 IP is not available) | Notes: @@ -298,28 +211,6 @@ Notes: - `--node-ip 1.2.3.4,abcd::ef01` was previously valid syntax when using no `--cloud-provider`, but was not valid for cloud kubelets. -<<[UNRESOLVED multiple-addresses ]>> - -In the examples above, `--node-ip IPv4` means "pick any one IPv4 -address", not "use all IPv4 addresses". This seemed to me to be -consistent with the behavior when specifying a specific IP address (in -which case all other IPs of the same NodeAddressType are removed from -the list), but is inconsistent with the behavior of `--node-ip -0.0.0.0` / `--node-ip ::` with legacy cloud providers and bare metal -(where that affects the _sorting_ of IPs but does not do any -_filtering_). - -In past discussions (eg https://kep.k8s.io/1664, -https://issues.k8s.io/95768) no one has ever been entirely sure what -the point of having multiple node.status.addresses values is, and -there has never been a way to get multiple node.status.addresses -values (of the same IP family) on "bare metal", so that would tend to -prevent people from designing features that depended on multiple node -addresses. So it's not clear that there's really any downside in -filtering out the "unused" node IPs... - -<<[/UNRESOLVED]>> - If the cloud only had IPv4 IPs for the node, then the same examples would look like: | `--node-ip` value | New? | Annotation | Resulting node addresses | @@ -331,22 +222,14 @@ If the cloud only had IPv4 IPs for the node, then the same examples would look l | `9.10.11.12` | no | `"9.10.11.12"` | (error, because the requested IP is not available) | | `abcd::5678` | no | `"abcd::5678"` | (error, because the requested IP is not available) | | `1.2.3.4,abcd::1234` | yes* | `"1.2.3.4,abcd::1234"` | (error, because the requested IPv6 IP is not available) | -| `IPv4` | yes | `"IPv4"` | `["1.2.3.4"]` (SS IPv4) | -| `IPv6` | yes | `"IPv6"` | (error, because no IPv6 IPs are available) | -| `IPv4,IPv6` | yes | `"IPv4,IPv6"` | (error, because no IPv6 IPs are available) | -| `IPv6,5.6.7.8` | yes | `"IPv6,5.6.7.8"` | (error, because no IPv6 IPs are available) | -| `IPv4,abcd::ef01` | yes | `"IPv4,abcd::ef01"` | (error, because the requested IPv6 IP is not available) | In this case, kubelet would be even more confused in the `--node-ip ::` case, and some things would likely not work. -By contrast, with `--node-ip IPv6`, the user would get a clear error. -### UPDATE: Late-breaking change for 1.27... - -While implementing the above behavior, it became clear that retaining -backward compatibility with old `--node-ip` values means the overall -behavior is idiosyncratic and full of loopholes. For example, given -the initial NodeAddress list: +Due to backward-compatibility constraints, it is not possible to end up +with a cluster of every type (single-stack/dual-stack, +IPv4-primary/IPv6-primary) in all cases. For example, given the initial +NodeAddress list: ```yaml addresses: @@ -366,36 +249,13 @@ You can request to get a single-stack IPv4 cluster with any of the three IPv4 IPs as the node IP (`--node-ip 10.0.0.1`, `--node-ip 10.0.0.2`, `--node-ip 192.168.0.1`); a dual-stack IPv4-primary cluster with any combination of the IPv4 and IPv6 IPs (`--node-ip -10.0.0.2,fd00::2`, etc); a dual-stack IPv6-primary cluster with any -combination of IPs (`--node-ip fd00::1,192.168.0.1`, etc); or an IPv6 -single-stack cluster using the _first_ IPv6 IP (`--node-ip IPv6`). - -But there is no way to get a single-stack IPv6 cluster using the -second IPv6 IP, because passing `--node-ip fd00::2` results in a -_dual-stack_ cluster, because the current, backward-compatible -semantics of single-valued `--node-ip` values means that the IPv4 -`ExternalIP` will be preserved. - -There are various ways this might be fixed (assuming we want to fix -it), but we might decide that the best way is something incompatible -with (or redundant with) the proposal above. Given that there is -already an UNRESOLVED section about the behavior of the `"IPv4"` and -`"IPv6"` literals anyway, I have decided to ignore them for the -(first?) alpha release, and implement _only_ the "two specific IPs" -behavior for now (since it is more straightforward, and already -implemented for non-cloud-provider clusters anyway). +10.0.0.2,fd00::2`, etc); or a dual-stack IPv6-primary cluster with any +combination of IPs (`--node-ip fd00::1,192.168.0.1`, etc). -Likewise, if we decide that the best way forward is to deprecate -`--node-ip` in favor of a new `--node-ips` option with better -semantics, then we might also have different ideas about the best way -to move forward with changing the annotation name (and semantics), so -I am ignoring the "new annotation" part of the KEP for now as well. - -<<[UNRESOLVED]>> - -Figure out the plan. - -<<[/UNRESOLVED]>> +But there is no way to get a single-stack IPv6 cluster, because passing +`--node-ip fd00::1` results in a _dual-stack_ cluster, because the +current, backward-compatible semantics of single-valued `--node-ip` +values means that the IPv4 `ExternalIP` will be preserved. ### Test Plan @@ -416,10 +276,7 @@ to implement this enhancement. ##### Prerequisite testing updates - +(None.) ##### Unit tests @@ -433,104 +290,42 @@ There will also be small changes in kubelet startup. ##### Integration tests - - -``` -<<[UNRESOLVED integration ]>> - -I don't know what existing integration testing of kubelet and cloud -providers there is. Given unit tests for the new `--node-ip` parsing -and `node.status.addresses`-setting code, and e2e tests of some sort, -we probably don't need integration tests. - -<<[/UNRESOLVED]>> -``` - -- : +Given unit tests for the new `--node-ip` parsing and +`node.status.addresses`-setting code, and e2e tests of some sort, we +probably don't need integration tests. ##### e2e tests - +There is now a `[cloud-provider-kind]` that can be used in kind-based +clusters to implement cloud-provider-based functionality. -``` -<<[UNRESOLVED e2e ]>> - -I'm not sure how we currently handle cloud-provider e2e. GCP does not -support IPv6 and `kind` does not use a cloud provider, so we cannot -test the new code/behavior in any of the "core" e2e jobs. - -@aojea suggests we _could_ use `kind` with an external cloud provider. -This would presumably have to be a bespoke dummy cloud provider, but -since we are only doing this to test the new `k/cloud-provider` code, -having a mostly-dummied-out cloud provider should not be a problem? It -could perhaps have a default `NodeAddresses` return value of -", , -, ", and then we could -pass `--node-ip ,IPv6` to kubelet. If the cloud -provider did not receive and interpret the `--node-ip` value -correctly, then we would end up with node IPs that didn't work and the -test job should fail. - -<<[/UNRESOLVED]>> -``` +By modifying this provider to allow manually overriding the default +node IPs, we should be able to create an e2e test job that brings up a +kind cluster with nodes having various IPs, and then tests different +`kubelet --node-ip` arguments to ensure that they have the expected +effect. -- : +[cloud-provider-kind]: https://github.com/kubernetes-sigs/cloud-provider-kind ### Graduation Criteria -#### Alpha (1?) +#### Alpha - Dual-stack `--node-ip` handling implemented behind a feature flag - Unit tests updated -- Initial e2e tests completed and enabled? Unless maybe the e2e tests - will take a bunch of development work on either kind or the - cloud-provider CI infrastructure, in which case we might decide to - pass on e2e testing in Alpha? - -#### Alpha 2? - -- Depending on discussions during the 1.28 cycle there may be another - Alpha release with additional changes, or we may proceed directly to - Beta... - #### Beta - Positive feedback / no bugs + - At least one release after Alpha -- No UNRESOLVED sections in the KEP -- e2e tests definitely completed and enabled if they weren't already in Alpha -- e2e test that new annotation gets cleared on rollback + +- Implement e2e test using `cloud-provider-kind`. #### GA - Positive feedback / no bugs -- Two releases after Beta to allow time for feedback - -#### Deprecation - -- Two releases after GA, kubelet will stop setting the legacy - `alpha.kubernetes.io/provided-node-ip` annotation (and start - deleting it). Cloud providers will stop honoring the legacy - annotation. - -- (Two releases after that, we can remove the code to delete the - legacy annotation when present.) ### Upgrade / Downgrade Strategy @@ -545,53 +340,23 @@ to start after downgrade. ### Version Skew Strategy -- Old kubelet / old cloud provider: Kubelet will only set the old - annotation, and the cloud provider will only read the old - annotation. Everything works (even if there's a stale new annotation - left over from a cluster rollback). - -- Old kubelet / new cloud provider: Kubelet will only set the old - annotation. The cloud provider will read the old annotation, and - will interpret it in the same way an old cloud provider would - (because all `--node-ip` values accepted by an old kubelet are - interpreted the same way by both old and new cloud providers). - Everything works (even if there's a stale new annotation left over - from a kubelet rollback, because the new cloud provider still - prefers the old annotation). +- Old kubelet / new cloud provider: Kubelet will set the annotation. + The cloud provider will read it and will interpret it in the same + way an old cloud provider would (because all `--node-ip` values + accepted by an old kubelet are interpreted the same way by both old + and new cloud providers). Everything works. - New kubelet, single-stack `--node-ip` value / old cloud provider: - Kubelet will set both annotations (to the same value). The cloud - provider will read the old annotation. Everything works, because - this is an "old" `--node-ip` value, and the old cloud provider knows - how to interpret it correctly. + Kubelet will set the annotations. The cloud provider will read it. + Everything works, because this is an "old" `--node-ip` value, and + the old cloud provider knows how to interpret it correctly. - New kubelet, dual-stack `--node-ip` value / old cloud provider: - Kubelet will set both annotations (to the same value). The cloud - provider will read the old annotation, but it will _not_ know how to - interpret it because it's a "new" value. So it will log an error and - leave the node tainted. (This is the desired result, since the cloud - provider is not able to obey the `--node-ip` value the administrator - provided.) - -- New kubelet / new cloud provider: Kubelet will set both annotations - (to the same value). The cloud provider will read the old - annotation, and will know how to interpret it regardless of whether - it's an "old" or "new" value. Everything works. - -- Future (post-GA) kubelet / GA cloud provider: Kubelet will set - the new annotation, and delete the old annotation if present. The - cloud provider will see that only the new annotation is set, and so - will use that. Everything works (even if there had been a stale old - annotation left over from the upgrade). - -- GA kubelet / Future (post-GA) cloud provider: Kubelet will set - both annotations (to the same value). The cloud provider will only - look at the new annotation. Everything works. - -- Future (post-GA) kubelet / Future (post-GA) cloud provider: Kubelet - will set the new annotation, and delete the old annotation if - present. The cloud provider will only look at the new annotation. - Everything works. + Kubelet will set the annotation. The cloud provider will read it, + but it will _not_ know how to interpret it because it's a "new" + value. So it will log an error and leave the node tainted. (This is + the desired result, since the cloud provider is not able to obey the + `--node-ip` value the administrator provided.) ## Production Readiness Review Questionnaire @@ -600,45 +365,20 @@ to start after downgrade. ###### How can this feature be enabled / disabled in a live cluster? - [X] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: CloudNodeIPs + - Feature gate name: CloudDualStackNodeIPs - Components depending on the feature gate: - kubelet - cloud-controller-manager ###### Does enabling the feature change any default behavior? -Kubelet will begin setting two annotations on each node rather than -one. However, cloud providers will still prefer the old annotation and -will only read the new annotation if the old one isn't there. - -So assuming no bugs, the only visible change after (just) enabling the -feature is the duplicate annotation. +No ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? -Yes, as long as you also roll back the kubelet configuration to no -longer use the new feature. - -One edge case: if a user who was not previously setting `--node-ip` -upgrades, enables the feature, starts using `--node-ip`, and then -rolls back the kubelet and its configuration but _not_ the cloud -provider, then they will end up with a node that has a stale "new" -annotation (with the `--node-ip` value they had been using while -upgraded), but no "old" annotation (because the rolled-back kubelet -configuration no longer specifies `--node-ip` so kubelet will delete -the old annotation). The (not-rolled-back) cloud provider will then -mistake this as being the future rather than the past, assuming -kubelet intentionally set only the new annotation and not the old -annotation, and so it will obey the (stale) new annotation. - -That could be prevented by stretching out the feature enablement over -a few more releases, and not having the cloud-provider switch from -"always use the old annotation" to "prefer the old annotation when -it's set but use the new annotation if it's the only one" until a few -releases later. If we were going to do that, then perhaps we should -split the annotation renaming into a separate KEP from the dual-stack -`--node-ip` handling, so that the edge cases of the annotation -renaming don't have to slow down the dual-stack `--node-ip` adoption. +Yes, as long as you also roll back the kubelet configuration (to no +longer use the new feature) either earlier or together with the feature +disablement. ###### What happens if we reenable the feature if it was previously rolled back? @@ -646,7 +386,10 @@ It works. ###### Are there any tests for feature enablement/disablement? -TBD; we should have a test that the annotation is cleared on rollback +No; enabling/disabling the feature gate has no immediate effect, and +changing between a single-stack and dual-stack `--node-ip` value is no +different, code-wise, than changing from one single-stack `--node-ip` +value to a new single-stack value. ### Rollout, Upgrade and Rollback Planning @@ -656,36 +399,45 @@ This section must be completed when targeting beta to a release. ###### How can a rollout or rollback fail? Can it impact already running workloads? - +Assuming no drastic bugs (eg, the cloud provider assigns Node X's IP +to Node Y for no obvious reason), just restarting the cloud provider +or kubelet with the new feature enabled-but-unused cannot fail. + +The cloud provider will not "re-taint" an existing working node if its +node IP annotation becomes invalid. Thus, if the administrator +accidentally rolls out a kubelet config that does something completely +wrong (eg, specifying a new secondary node IP value which is not +actually one of that node's IPs) then the only effect would be that +the cloud provider will log "Failed to update node addresses for node" +for that node. + +The most likely failure mode would be that in the process of +reconfiguring nodes to use the new feature, the administrator +reconfigures them _incorrecty_. (In particular, if nodes were +previously using auto-detected primary node IPs, and the administrator +needs to switch them to manually-specified dual-stack node IPs, they +might end up manually specifying wrong (but valid) primary IPs.) In +that case, the cloud provider would accept the new node IP value and +update the node's addresses, possibly resulting in disruption. +However, this would only happen as each kubelet was reconfigured and +restarted, so as long as the administrator did not roll out the new +configurations to every node simultaneously without any +sanity-checking, they should only break a single node. ###### What specific metrics should inform a rollback? - +There are no relevant metrics; however, the feature will only affect +nodes that have been reconfigured to use the new feature, so it should +be obvious to the administrator if the feature is not working +correctly. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? - +TODO; do a manual test ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? - +No. ### Monitoring Requirements @@ -711,12 +463,12 @@ The Node will have the IPs they expect it to have. ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? -N/A. This changes the startup behavior of Nodes (and does not affect +N/A. This changes the startup behavior of kubelet (and does not affect startup speed). There is no ongoing "service". ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? -N/A. This changes the startup behavior of Nodes (and does not affect +N/A. This changes the startup behavior of kubelet (and does not affect startup speed). There is no ongoing "service". ###### Are there any missing metrics that would be useful to have to improve observability of this feature? @@ -753,12 +505,7 @@ No ###### Will enabling / using this feature result in increasing size or count of the existing API objects? -Not really. Kubelet will now create two node-IP-related annotations on -each Node rather than just one, and the value of the annotation may -be slightly larger (eg because it now contains two IP addresses rather -than one), and some users may change their `--node-ip` to take -advantage of the new functionality in a way that would cause more node -IPs to be exposed in `node.status.addresses` than before. +No ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? @@ -822,14 +569,111 @@ additional functionality. ## Alternatives +The original version of this KEP proposed further changes to +`--node-ip` handling: + +> Additionally, the fact that kubelet does not currently pass +> "`0.0.0.0`" and "`::`" to the cloud provider creates a compatibility +> problem: we would like for administrators to be able to say "use an +> IPv6 node IP but I don't care which one" in cloud-provider clusters +> like they can in non-cloud-provider clusters, but for compatibility +> reasons, we can't change the existing behavior of "`--cloud-provider +> external --node-ip ::`" (which doesn't do what it's "supposed to", but +> does have possibly-useful side effects that have led some users to use +> it anyway; see [kubernetes #111695].) +> +> So instead, we will add new syntax, and allow administrators to say +> "`--node-ip IPv4`" or "`--node-ip IPv6`" if they want to explicitly +> require that the cloud provider pick a node IP of a specific family. +> (This also improves on the behavior of the existing "`0.0.0.0`" and +> "`::`" options, because you almost never actually want the "or fall +> back to the other family if there are no IPs of this family" behavior +> that "`0.0.0.0`" and "`::`" imply.) +> +> Additionally, we will update the code to allow including "`IPv4`" and +> "`IPv6`" in dual-stack `--node-ip` values as well (in both cloud and +> non-cloud clusters). This code will have to check the status of the +> feature gate until the feature is GA. + +[kubernetes #111695]: https://github.com/kubernetes/kubernetes/issues/111695 + +As well as to the Node IP annotation: + +> That said, the existing annotation name is +> `alpha.kubernetes.io/provided-node-ip` but it hasn't been "alpha" for +> a long time. We should fix this. So: +> +> 1. When `--node-ip` is unset, kubelet should delete both the legacy +> `alpha.kubernetes.io/provided-node-ip` annotation and the new +> `kubernetes.io/provided-node-ip` annotation (regardless of +> whether the feature gate is enabled or not, to avoid problems +> with rollback and skew). +> +> 2. When the `CloudDualStackNodeIPs` feature is enabled and `--node-ip` is +> set, kubelet will set both the legacy annotation and the new +> annotation. (It always sets them both to the same value, even if +> that's a value that old cloud providers won't understand). +> +> 2. When the `CloudDualStackNodeIPs` feature is enabled, the cloud provider +> will use the new `kubernetes.io/provided-node-ip` annotation _if +> the legacy alpha annotation is not set_. (But if both annotations +> are set, it will prefer the legacy annotation, so as to handle +> rollbacks correctly.) +> +> 3. A few releases after GA, kubelet can stop setting the legacy +> annotation, and switch to unconditionally deleting it, and +> setting/deleting the new annotation depending on whether +> `--node-ip` was set or not. Cloud providers will also switch to +> only using the new annotation, and perhaps logging a warning if +> they see a node with the old annotation but not the new +> annotation. +> +> Kubelet will preserve the existing behavior of _not_ passing +> "`0.0.0.0`" or "`::`" to the cloud provider, even via the new +> annotation. This is needed to preserve backward compatibility with +> current behavior in clusters using those `--node-ip` values. However, +> it _will_ pass "`IPv4`" and/or "`IPv6`" if they are passed as the +> `--node-ip`. + +However, trying to implement this behavior turned up problems: + +> While implementing the above behavior, it became clear that retaining +> backward compatibility with old `--node-ip` values means the overall +> behavior is idiosyncratic and full of loopholes. For example, given +> the initial NodeAddress list: +> +> ```yaml +> addresses: +> - type: InternalIP +> address: 10.0.0.1 +> - type: InternalIP +> address: 10.0.0.2 +> - type: InternalIP +> address: fd00::1 +> - type: InternalIP +> address: fd00::2 +> - type: ExternalIP +> address: 192.168.0.1 +> ``` +> +> You can request to get a single-stack IPv4 cluster with any of the +> three IPv4 IPs as the node IP (`--node-ip 10.0.0.1`, `--node-ip +> 10.0.0.2`, `--node-ip 192.168.0.1`); a dual-stack IPv4-primary cluster +> with any combination of the IPv4 and IPv6 IPs (`--node-ip +> 10.0.0.2,fd00::2`, etc); a dual-stack IPv6-primary cluster with any +> combination of IPs (`--node-ip fd00::1,192.168.0.1`, etc); or an IPv6 +> single-stack cluster using the _first_ IPv6 IP (`--node-ip IPv6`). +> +> But there is no way to get a single-stack IPv6 cluster using the +> second IPv6 IP, because passing `--node-ip fd00::2` results in a +> _dual-stack_ cluster, because the current, backward-compatible +> semantics of single-valued `--node-ip` values means that the IPv4 +> `ExternalIP` will be preserved. + In the discussion around [KEP-1664] there was talk of replacing -`--node-ip` with a new `--node-ips` (plural) field, but this was -mostly because the behavior of `--node-ip` in clusters with external -cloud providers was incompatible with the behavior of `--node-ip` in -clusters with legacy or no cloud providers and I wasn't sure if we -could get away with changing it to make it consistent. However, -[kubernetes #107750] did just that already, so there's now nothing -standing in the way of synchronizing the rest of the behavior. +`--node-ip` with a new `--node-ips` (plural) field with +new-and-improved semantics, and I think this is what we're going to +have to do if we want to make this better. But that will have to wait +for another KEP. [KEP-1664]: https://github.com/kubernetes/enhancements/issues/1664 -[kubernetes #107750]: https://github.com/kubernetes/kubernetes/pull/107750 diff --git a/keps/sig-network/3705-cloud-node-ips/kep.yaml b/keps/sig-network/3705-cloud-node-ips/kep.yaml index 632936c713c0..47de78fa8850 100644 --- a/keps/sig-network/3705-cloud-node-ips/kep.yaml +++ b/keps/sig-network/3705-cloud-node-ips/kep.yaml @@ -16,19 +16,18 @@ reviewers: - "@thockin" approvers: - "@thockin" - - TBD from sig-cloud-provider see-also: - "https://github.com/kubernetes/enhancements/issues/1664" - "https://github.com/kubernetes/enhancements/pull/1665" # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.27" +latest-milestone: "v1.28" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: