WIP: Support ipv6 dual stack for Azure for testing #2799

smarterclayton · 2019-12-12T04:49:17Z

Built on top of #2555
to copy the pattern.

Azure is generally simpler about this, although there's a bug in
the terraform azure provider that blocks the natural setup (opened hashicorp/terraform-provider-azurerm#5140 to see if we can get an easy fix).

Currently this is not completely plumbed, since we can't create subnets with multiple addresses. Will be testing with the linked fix.

This is a hack to not block IPv6 when doing CIDR validation. We may want to do something smarter later, like validate that IPv4 or IPv6 is used consistently, or validate based on per-platform capabilities. For now, removing this check unblocks IPv6 work.

If you run "create cluster", the VPC will have an IPv6 subnet assigned by AWS (a /56). Then, each subnet created in the VPC will get a slice of this (a /64). I can see that the VMs created all had IPv6 addresses associated with them, at least from the AWS perspective. The networking section of my current install-config looks like this: networking: clusterNetwork: - cidr: fd01::/48 hostPrefix: 64 machineCIDR: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - fd02::/112 I kept machineCIDR as IPv4, since on AWS we can't get IPv6 only. We get IPv4 + optionally IPv6. We use machineCIDR to specify the IPv4 address range we carve up in the VPC. For IPv6, AWS gives us the /56 that we carve up, so we don't need a subnet specified in the config. For clusterNetwork and serviceNetwork, both of these are used exclusively inside the cluster, so I changed them to be specified as private IPv6 subnets.

This allows all of our VMs to speak IPv6 out to the internet.

This adds a route on the private subnet to allow internet access through the same internet gateway used by the public subnet. In the IPv4 case, we use a NAT gateway from the private subnet. For IPv6, we don't use NAT since the addresses are publicly routable.

All security group rules that referenced an IPv6 CIDR have been duplicated to reference the corresponding IPv6 CIDR.

All of the IPv4 records remain for now.

Introduce a "use_ipv6" terraform variable that is hard coded to "true". Use it to disable the IPv4 DNS records (A records), leaving only the IPv6 (AAAA) records in place.

Back in commit 16dfbb3, the installer switched from classic load balancers to network load balancers. Unfortunately, NLBs don't support IPv6. This commit introduces conditional classic load balancers when use_ipv6 is turned on.

Apply the use_ipv6 boolean in some places that were added before the boolean existed.

RHCOS doesn't get an IPv6 address during boot before running ignition, so without this record it will fail to contact the machine config server. Turn this record back on as a temporary workaround until we can do IPv6 during ignition.

The ELB for IPv6 was allowing any source to reach the api-int load balancer, instead of only the CIDRs used by the cluster.

Much of this was written before the code merged to allow disabling public endpoints. I went through and tried to honor the new settings in various places that I missed during an earlier rebase.

Add a new terraform variable, aws_use_ipv6, for controlling the AWS IPv6 dev/test environment. It defaults to false, retaining existing behavior by default. To turn it on, you must set the TF_VAR_aws_use_ipv6 environment variable to "true". A future commit will expand on this with developer documentation.

The tests assumed IPv6 CIDRs are not allowed. A previous commit started allowing them and this updates the test cases accordingly.

Instead of telling people to set TF_VAR_aws_use_ipv6, wrap it with an OpenShift env var.

I missed a few more things when I started making the IPv6 support conditional. This was exposed by trying to run an install with IPv6 turned off (the default). With this patch, a regular install worked for me again.

For our AWS IPv6 env, we run CoreDNS with host networking so it can reach AWS DNS over IPv4. We need to allow traffic to CoreDNS via security groups when run this way. Otherwise, only DNS queries that happen to hit CoreDNS on the same host will work.

This branch had configured classic load balancers for use with IPv6, because it appeared that they supported IPv6. Despite getting IPv6 addresses, they don't actually work. IPv6 with classic load balancers only works if you're using EC2 Classic, and not VPC. They just rejected IPv6 connections in our case, and various forum posts seem to confirm this behavior if you try to use it this way. The documentation for classic load balancers discusses this a bit: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-internet-facing-load-balancers.html The "ipv6" hostnames exist, even if you use AWS-VPC, but they just don't work. The docs seem to indicate that the hostnames only exist with EC2-Classic, but that's not true. AWS Application Load Balancers (doing HTTPS load balancing) are supposed to support IPv6, but that would require doing cert management with AWS. The simplest approach here is to just fall back to the Network Load Balancers used with IPv4. We'll allow API access through the load balancer over IPv4. Inside the cluster, API access is still all IPv6, where it's using the kubernetes Service and Endpoints to reach the API. This seems good enough for our AWS dev/test environment for now.

We reverted back to the IPv4 NLBs, so reflect that status in this doc.

The DNS pods run on workers as well as masters, so replicate the IPv6-only rule we added to allow DNS traffic on the host network. Also remove the unnecessary "self" rule from the masters.

Use an OPENSHIFT_INSTALL_ prefix instead of just OPENSHIFT_.

Add more details that have come up in review so far.

These comments were about something to check related to a terraform upgrade. The code is right as it is. As recommended by the comment itself, these TODO comments should just be removed now that it has been checked.

Note a todo item that I will take care of soon: changing to an egress-only internet gateway. https://aws.amazon.com/premiumsupport/knowledge-center/configure-private-ipv6-subnet/

This variable was added in an earlier commit, but is no longer used.

Move the conditional part of the expression to only apply to the part of the formatlist() that's different between IPv4 and IPv6.

This patch mostly just moves the count to the beginning of a resource as a style change, as requested during review. While doing this, I noticed two IPv6 security group rules that were not conditional on IPv6 support being turned on.

Make use of the existing etcd_ip_addresses instead.

As requested in review: - when aws IPv6 is on, force OVNKubernetes network type - Force IPv6 CIDRs where needed when AWS IPv6 is on - Re-add IPv4 CIDR enforcement everywhere else - make sure the platform is AWS if the aws IPv6 env var is set

openshift-ci-robot · 2019-12-12T18:40:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

go-cidr was fixed in the last year to correct an int64 overflow issue with CIDRs.

The first is that private_ip_address_version is not supported on load balancers in 2018-12-01 azure, but is in 2019-04-01. Specifically override that one call until we bump. The second is that the ip_version for static allocation check on the load balancer is out of date for Azure - LBs can have both types of allocation in the Standard SKU since Azure added preview of the type. Remove the check like we would for an upstream patch. The third is to add support for subnets to have multiple address prefixes which is required to add dual stack support. This is an upstream change.

IPv4/IPv6 dual stack requires AAAA records for several sub components. Until we bump our Azure version, we can add to the existing private dns local module.

smarterclayton · 2019-12-16T06:19:23Z

See the commit messages, this should be ready for testing ipv4 primary with ipv6 secondary for dual stack.

The installer can now provision a dual-stack IP v4/v6 (v4 primary) cluster if OPENSHIFT_INSTALL_AZURE_USE_IPV6=true is passed on install. Unlike the AWS configuration, this is currently configured to make IPv6 transparent. The machine CIDR for ipv6 is not configurable and is currently hardcoded to fd00::/48. A full dual stack configuration would in theory look like the following (barring the missing machineCIDR configurability). ``` networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 - cidr: fd01::/48 hostPrefix: 64 machineCIDR: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - 10.1.0.0/16 - fd02::/112 ``` This commit will be updated to include the ipv6 primary mode (if all input addresses are ipv6) and to not require the use of the environment variable.

openshift-ci-robot · 2019-12-16T06:29:15Z

@smarterclayton: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/verify-vendor	`377e920`	link	`/test verify-vendor`
ci/prow/govet	`377e920`	link	`/test govet`
ci/prow/unit	`377e920`	link	`/test unit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

smarterclayton · 2020-01-11T23:07:05Z

Closing in favor of the newer test

openshift-ci-robot · 2020-01-11T23:07:12Z

@smarterclayton: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ironcladlou · 2020-01-14T19:39:25Z

Looks like this is replaced by #2847

russellb and others added 30 commits December 5, 2019 13:57

aws: Add IPv6 route to the AWS internet gateway

e770d2e

This allows all of our VMs to speak IPv6 out to the internet.

aws: Update security groups for IPv6

5b44046

All security group rules that referenced an IPv6 CIDR have been duplicated to reference the corresponding IPv6 CIDR.

aws: Create IPv6 DNS records

6d19648

All of the IPv4 records remain for now.

aws: Disable IPv4 DNS records

e8ea7af

Introduce a "use_ipv6" terraform variable that is hard coded to "true". Use it to disable the IPv4 DNS records (A records), leaving only the IPv6 (AAAA) records in place.

aws: Create IPv6 compatible load balancers

9a7fa84

Back in commit 16dfbb3, the installer switched from classic load balancers to network load balancers. Unfortunately, NLBs don't support IPv6. This commit introduces conditional classic load balancers when use_ipv6 is turned on.

aws: Apply use_ipv6 boolean in more places

68636ab

Apply the use_ipv6 boolean in some places that were added before the boolean existed.

aws: Re-enable api-int A record in IPv6 mode

4a64b3d

RHCOS doesn't get an IPv6 address during boot before running ignition, so without this record it will fail to contact the machine config server. Turn this record back on as a temporary workaround until we can do IPv6 during ignition.

aws: Don't accept public connections to api-int

b1c9acf

The ELB for IPv6 was allowing any source to reach the api-int load balancer, instead of only the CIDRs used by the cluster.

aws: Check local_endpoints for more IPv6 resources

2a1ac77

Much of this was written before the code merged to allow disabling public endpoints. I went through and tried to honor the new settings in various places that I missed during an earlier rebase.

aws: Document the IPv6 dev/test env

750c4b2

validate: Update tests for IPv6 CIDR validation

3874adc

The tests assumed IPv6 CIDRs are not allowed. A previous commit started allowing them and this updates the test cases accordingly.

aws: Add env var OPENSHIFT_AWS_USE_IPV6

c46de1a

Instead of telling people to set TF_VAR_aws_use_ipv6, wrap it with an OpenShift env var.

aws: Make more IPv6 resources conditional

b0add46

I missed a few more things when I started making the IPv6 support conditional. This was exposed by trying to run an install with IPv6 turned off (the default). With this patch, a regular install worked for me again.

Use TCP health checks from the IPv6 load balancers

834b8b3

aws: Allow port 5353 to masters.

8521d1d

For our AWS IPv6 env, we run CoreDNS with host networking so it can reach AWS DNS over IPv4. We need to allow traffic to CoreDNS via security groups when run this way. Otherwise, only DNS queries that happen to hit CoreDNS on the same host will work.

docs: Update aws_ipv6 description of load balancer usage.

86c9307

We reverted back to the IPv4 NLBs, so reflect that status in this doc.

aws: Allow port 5353 to workers.

9648a03

The DNS pods run on workers as well as masters, so replicate the IPv6-only rule we added to allow DNS traffic on the host network. Also remove the unnecessary "self" rule from the masters.

Rename AWS IPv6 installer support variable

3e06624

Use an OPENSHIFT_INSTALL_ prefix instead of just OPENSHIFT_.

docs: Update aws_ipv6.md with more details

76f1ab0

Add more details that have come up in review so far.

aws: Drop TODO comments about a terraform upgrade

2eb483e

These comments were about something to check related to a terraform upgrade. The code is right as it is. As recommended by the comment itself, these TODO comments should just be removed now that it has been checked.

aws-ipv6: Note future move to egress only gateway

40891fb

Note a todo item that I will take care of soon: changing to an egress-only internet gateway. https://aws.amazon.com/premiumsupport/knowledge-center/configure-private-ipv6-subnet/

aws: Remove unused variable

7d7082f

This variable was added in an earlier commit, but is no longer used.

aws: Simplify a formatlist conditional

9907dd2

Move the conditional part of the expression to only apply to the part of the formatlist() that's different between IPv4 and IPv6.

aws-ipv6: Adjust resource counts

c21d6df

This patch mostly just moves the count to the beginning of a resource as a style change, as requested during review. While doing this, I noticed two IPv6 security group rules that were not conditional on IPv6 support being turned on.

aws-ipv6: Add comments to help explain subnet allocation

df9156f

russellb added 2 commits December 9, 2019 22:37

aws-ipv6: Drop etcd_ipv6_addresses

9b4f14d

Make use of the existing etcd_ip_addresses instead.

aws-ipv6: Add more validation

71acc53

As requested in review: - when aws IPv6 is on, force OVNKubernetes network type - Force IPv6 CIDRs where needed when AWS IPv6 is on - Re-add IPv4 CIDR enforcement everywhere else - make sure the platform is AWS if the aws IPv6 env var is set

openshift-ci-robot requested review from jcpowermac and patrickdillon December 12, 2019 04:49

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 12, 2019

smarterclayton added 3 commits December 16, 2019 01:00

vendor: Bump go-cidr to correctly handle ipv6 subnet CIDR math

d9eb7d6

go-cidr was fixed in the last year to correct an int64 overflow issue with CIDRs.

azureprivatedns: Add support for AAAA records

6e187b4

IPv4/IPv6 dual stack requires AAAA records for several sub components. Until we bump our Azure version, we can add to the existing private dns local module.

smarterclayton force-pushed the azure_ipv6 branch from fd56926 to d205955 Compare December 16, 2019 06:16

smarterclayton force-pushed the azure_ipv6 branch from d205955 to 377e920 Compare December 16, 2019 06:24

smarterclayton closed this Jan 11, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support ipv6 dual stack for Azure for testing #2799

WIP: Support ipv6 dual stack for Azure for testing #2799

smarterclayton commented Dec 12, 2019

openshift-ci-robot commented Dec 12, 2019

smarterclayton commented Dec 16, 2019

openshift-ci-robot commented Dec 16, 2019 •

edited

Loading

smarterclayton commented Jan 11, 2020

openshift-ci-robot commented Jan 11, 2020

ironcladlou commented Jan 14, 2020

WIP: Support ipv6 dual stack for Azure for testing #2799

WIP: Support ipv6 dual stack for Azure for testing #2799

Conversation

smarterclayton commented Dec 12, 2019

openshift-ci-robot commented Dec 12, 2019

smarterclayton commented Dec 16, 2019

openshift-ci-robot commented Dec 16, 2019 • edited Loading

smarterclayton commented Jan 11, 2020

openshift-ci-robot commented Jan 11, 2020

ironcladlou commented Jan 14, 2020

openshift-ci-robot commented Dec 16, 2019 •

edited

Loading