-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to control which subnets and IPs get used for the API loadbalancer #10741
Conversation
Move checks for valid subnet operations into CheckChanges. This also fixes a bug where changes would cause immutable field errors while it's actually perfectly fine to add new subnets (only detaching is forbidden). This also commit changes the actualSubnets and expectedSubnets lists to be maps of *string. This is in preparation for the next commit that then relies on it being a map.
Hi @codablock. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
FYI, I've tested this on the 1.19 branch and would be really happy if this could be backported/cherry-picked into 1.19 :) |
862e708
to
1700161
Compare
SubnetMappings allow to explicitely set the private IPv4 address that must be used for the NLB. SubnetMappings and Subnets in the AWS API are compatible as long as the address settings are not changes, making this commit backwards compatible.
a89a31b
to
6d35423
Compare
It is only allowed to add more subnets and forbidden to remove existing ones. This is due to limitations on AWS | ||
ELBs and NLBs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there the possibility to remove existing ones if you accept that the consequence is destroying the loadbalancer and creating a new one? eg --force-recreate-loadbalancer
I'd consider losing access to the loadbalancer acceptible during a maintenance window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume that this would be possible. But this is not a new problem and thus should be part of another PR. Currently, when you change your set of shared subnets, you'd run into the same issue already as the automatically chosen subnets would differ then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was primarily around the validation side of things. If I wanted to change those values and accepted that I'd have to (manually) remove the loadbalancer, would your changes to kops reject my edits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would accept the edits and only fail when running the update. The failure is a little bit unfortunate as it keeps retrying but then of course never succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workaround for the above behavior is to manually delete the load balancer. A kops update cluster
should then report that it will (re)create the load balancer which should succeed with --yes
. You'll run into certificate issues with gossip clusters though since the load balancer's DNS name is used in various certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see you mentioned this a few lines below. I believe you'll also need to kops rolling-update cluster --instance-group-roles=Master --cloudonly
after kops update cluster --yes
recreates the NLB in order to pick up new certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rifelpet I did not observe the need for rolling-update while doing my testing, which I did on a semi-live (not prod, but a lot of stuff deployed) cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may only be gossip clusters that need the rolling update (cluster names ending in .k8s.local
, was yours gossip or using route53 ? Now that I think about it more, I think the nodes themselves might need to be rolled too to pick up the new load balancer dns name which is pretty invasive. We could say "It is not recommended to make this change on gossip clusters due to the requirement of performing a --cloudonly rolling update on the entire cluster which results in temporary downtime of all workloads. It is recommended to create a new cluster instead."
@rifelpet Thanks for your review. I've gone through it and fixed things where necessary. You can review the changes per-commit. |
7fd35dd
to
091a18a
Compare
@rifelpet I pushed a few more commits to handle your latest comments. |
this looks good to me, thanks for implementing this! I'll have someone else take a pass at it too /ok-to-test |
@rifelpet Can you help me find the logs of the test failures that appeared now? |
Yea you can click the Details button in the list of checks at the bottom of the PR. here are the two failures: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kops/10741/pull-kops-verify-staticcheck/1359527983353696256 is a bit strange, i think the line numbers are off but its essentially complaining that the allocation ID and private IP v4 address fields in the cloudformation JSON file are null. We should probably not include them at all ( |
@rifelpet Thanks, that helped me :) Should be fixed now. |
/lgtm I'll give others a chance for input otherwise we can get this merged soon. I'll bring it up at kOps office hours on friday too. Excited to get this in :) |
@rifelpet What do you think about the chances of this being cherry-picked into 1.19? |
1.19.X releases will contain bug fixes only. Since this is a new feature it likely wont get backported to 1.19 though we are aiming for an accelerated 1.20.0 release so the wait shouldn't be too long. |
ok, having this in 1.20.0 would already be very cool. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: codablock, rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
Just seen that this got merged, thanks :) Any steps required from my side to get this cherry-picked into 1.20? |
@codablock I took care of it in #10829 |
…41-origin-release-1.20 Automated cherry pick of #10741: Refactor and fix NLB subnet change checks
This PR implements the new field
subnets
for the api loadbalancer in the cluster spec:It allows to control which subnets are used by the api load balancer. This is useful when VPCs and subnets are managed externally and a more granular separation between subnets is desired, e.g. when you have subnets that offer VPN access to on-prem networks, subnets which offer public IPs and subnets which are completely private.
In such case, the subnets must be carefully chosen to not expose the API in the wrong place.
Review can be done on a per-commit basis as I've tried to separate changes.