-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support ipv6 dual stack for Azure for testing #2799
Conversation
This is a hack to not block IPv6 when doing CIDR validation. We may want to do something smarter later, like validate that IPv4 or IPv6 is used consistently, or validate based on per-platform capabilities. For now, removing this check unblocks IPv6 work.
If you run "create cluster", the VPC will have an IPv6 subnet assigned by AWS (a /56). Then, each subnet created in the VPC will get a slice of this (a /64). I can see that the VMs created all had IPv6 addresses associated with them, at least from the AWS perspective. The networking section of my current install-config looks like this: networking: clusterNetwork: - cidr: fd01::/48 hostPrefix: 64 machineCIDR: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - fd02::/112 I kept machineCIDR as IPv4, since on AWS we can't get IPv6 only. We get IPv4 + optionally IPv6. We use machineCIDR to specify the IPv4 address range we carve up in the VPC. For IPv6, AWS gives us the /56 that we carve up, so we don't need a subnet specified in the config. For clusterNetwork and serviceNetwork, both of these are used exclusively inside the cluster, so I changed them to be specified as private IPv6 subnets.
This allows all of our VMs to speak IPv6 out to the internet.
This adds a route on the private subnet to allow internet access through the same internet gateway used by the public subnet. In the IPv4 case, we use a NAT gateway from the private subnet. For IPv6, we don't use NAT since the addresses are publicly routable.
All security group rules that referenced an IPv6 CIDR have been duplicated to reference the corresponding IPv6 CIDR.
All of the IPv4 records remain for now.
Introduce a "use_ipv6" terraform variable that is hard coded to "true". Use it to disable the IPv4 DNS records (A records), leaving only the IPv6 (AAAA) records in place.
Back in commit 16dfbb3, the installer switched from classic load balancers to network load balancers. Unfortunately, NLBs don't support IPv6. This commit introduces conditional classic load balancers when use_ipv6 is turned on.
Apply the use_ipv6 boolean in some places that were added before the boolean existed.
RHCOS doesn't get an IPv6 address during boot before running ignition, so without this record it will fail to contact the machine config server. Turn this record back on as a temporary workaround until we can do IPv6 during ignition.
The ELB for IPv6 was allowing any source to reach the api-int load balancer, instead of only the CIDRs used by the cluster.
Much of this was written before the code merged to allow disabling public endpoints. I went through and tried to honor the new settings in various places that I missed during an earlier rebase.
Add a new terraform variable, aws_use_ipv6, for controlling the AWS IPv6 dev/test environment. It defaults to false, retaining existing behavior by default. To turn it on, you must set the TF_VAR_aws_use_ipv6 environment variable to "true". A future commit will expand on this with developer documentation.
The tests assumed IPv6 CIDRs are not allowed. A previous commit started allowing them and this updates the test cases accordingly.
Instead of telling people to set TF_VAR_aws_use_ipv6, wrap it with an OpenShift env var.
I missed a few more things when I started making the IPv6 support conditional. This was exposed by trying to run an install with IPv6 turned off (the default). With this patch, a regular install worked for me again.
For our AWS IPv6 env, we run CoreDNS with host networking so it can reach AWS DNS over IPv4. We need to allow traffic to CoreDNS via security groups when run this way. Otherwise, only DNS queries that happen to hit CoreDNS on the same host will work.
This branch had configured classic load balancers for use with IPv6, because it appeared that they supported IPv6. Despite getting IPv6 addresses, they don't actually work. IPv6 with classic load balancers only works if you're using EC2 Classic, and not VPC. They just rejected IPv6 connections in our case, and various forum posts seem to confirm this behavior if you try to use it this way. The documentation for classic load balancers discusses this a bit: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-internet-facing-load-balancers.html The "ipv6" hostnames exist, even if you use AWS-VPC, but they just don't work. The docs seem to indicate that the hostnames only exist with EC2-Classic, but that's not true. AWS Application Load Balancers (doing HTTPS load balancing) are supposed to support IPv6, but that would require doing cert management with AWS. The simplest approach here is to just fall back to the Network Load Balancers used with IPv4. We'll allow API access through the load balancer over IPv4. Inside the cluster, API access is still all IPv6, where it's using the kubernetes Service and Endpoints to reach the API. This seems good enough for our AWS dev/test environment for now.
We reverted back to the IPv4 NLBs, so reflect that status in this doc.
The DNS pods run on workers as well as masters, so replicate the IPv6-only rule we added to allow DNS traffic on the host network. Also remove the unnecessary "self" rule from the masters.
Use an OPENSHIFT_INSTALL_ prefix instead of just OPENSHIFT_.
Add more details that have come up in review so far.
These comments were about something to check related to a terraform upgrade. The code is right as it is. As recommended by the comment itself, these TODO comments should just be removed now that it has been checked.
Note a todo item that I will take care of soon: changing to an egress-only internet gateway. https://aws.amazon.com/premiumsupport/knowledge-center/configure-private-ipv6-subnet/
This variable was added in an earlier commit, but is no longer used.
Move the conditional part of the expression to only apply to the part of the formatlist() that's different between IPv4 and IPv6.
This patch mostly just moves the count to the beginning of a resource as a style change, as requested during review. While doing this, I noticed two IPv6 security group rules that were not conditional on IPv6 support being turned on.
Make use of the existing etcd_ip_addresses instead.
As requested in review: - when aws IPv6 is on, force OVNKubernetes network type - Force IPv6 CIDRs where needed when AWS IPv6 is on - Re-add IPv4 CIDR enforcement everywhere else - make sure the platform is AWS if the aws IPv6 env var is set
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
go-cidr was fixed in the last year to correct an int64 overflow issue with CIDRs.
The first is that private_ip_address_version is not supported on load balancers in 2018-12-01 azure, but is in 2019-04-01. Specifically override that one call until we bump. The second is that the ip_version for static allocation check on the load balancer is out of date for Azure - LBs can have both types of allocation in the Standard SKU since Azure added preview of the type. Remove the check like we would for an upstream patch. The third is to add support for subnets to have multiple address prefixes which is required to add dual stack support. This is an upstream change.
IPv4/IPv6 dual stack requires AAAA records for several sub components. Until we bump our Azure version, we can add to the existing private dns local module.
fd56926
to
d205955
Compare
See the commit messages, this should be ready for testing ipv4 primary with ipv6 secondary for dual stack. |
The installer can now provision a dual-stack IP v4/v6 (v4 primary) cluster if OPENSHIFT_INSTALL_AZURE_USE_IPV6=true is passed on install. Unlike the AWS configuration, this is currently configured to make IPv6 transparent. The machine CIDR for ipv6 is not configurable and is currently hardcoded to fd00::/48. A full dual stack configuration would in theory look like the following (barring the missing machineCIDR configurability). ``` networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 - cidr: fd01::/48 hostPrefix: 64 machineCIDR: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - 10.1.0.0/16 - fd02::/112 ``` This commit will be updated to include the ipv6 primary mode (if all input addresses are ipv6) and to not require the use of the environment variable.
d205955
to
377e920
Compare
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Closing in favor of the newer test |
@smarterclayton: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like this is replaced by #2847 |
Built on top of #2555
to copy the pattern.
Azure is generally simpler about this, although there's a bug in
the terraform azure provider that blocks the natural setup (opened hashicorp/terraform-provider-azurerm#5140 to see if we can get an easy fix).
Currently this is not completely plumbed, since we can't create subnets with multiple addresses. Will be testing with the linked fix.