Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Control Plane Ingress VIP management to MetalK8s Operator instead of MetalLB #4000

Merged
merged 15 commits into from
Mar 1, 2023

Conversation

TeddyAndrieux
Copy link
Collaborator

  • Fix some flaky tests
  • Rework the Sub reconciler handling in MetalK8s Operator
  • Rework VirtualIPPool to be less "WP Ingress linked"
  • Move Control Plane Ingress IP handling in MetalK8s Operator
  • Properly handle upgrade and migration from MetalLB
  • Remove everything linked to MetalLB

@TeddyAndrieux TeddyAndrieux requested a review from a team as a code owner February 24, 2023 17:18
@bert-e
Copy link
Contributor

bert-e commented Feb 24, 2023

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Feb 24, 2023

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(first batch, only read the first 3 commits)

tests/post/steps/test_logging.py Show resolved Hide resolved
operator/pkg/controller/clusterconfig/controller.go Outdated Show resolved Hide resolved
operator/pkg/controller/clusterconfig/controller.go Outdated Show resolved Hide resolved
operator/pkg/controller/clusterconfig/controller.go Outdated Show resolved Hide resolved
operator/pkg/controller/utils/reconcile_test.go Outdated Show resolved Hide resolved
@bert-e
Copy link
Contributor

bert-e commented Feb 28, 2023

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last few comments, but LGTM overall 👌

operator/api/v1alpha1/clusterconfig_types.go Show resolved Hide resolved
for link in netifaces.ifaddresses(interface).get(netifaces.AF_INET) or []:
if link.get("addr") == ip:
return interface

# NOTE: We do not have any easy way (without an external lib)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We included said external lib, should we still keep the previous logic? What about the bug with loopback you mentioned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic to handle the "loopback" bug, we see if there is an interface that already has the VIP.

AFAIK, this lib cannot be used to get the route.

The logic is if the VIP already sits on an interface (it usually shouldn't happen but it could if for whatever reason keepavlied get killed) we keep it, otherwise we "search" it based on the routes

buildchain/buildchain/salt_tree.py Outdated Show resolved Hide resolved
tests/post/steps/test_ingress.py Outdated Show resolved Hide resolved
.github/actions/enable-cp-ingress-managed-vip/action.yaml Outdated Show resolved Hide resolved
@TeddyAndrieux TeddyAndrieux force-pushed the feature/move-cp-vips-handling-to-operator branch from 4132813 to a9199fa Compare February 28, 2023 18:36
Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still unsure about our risk of script injection :/ Although we don't really document how to use it, nor expose this functionality to users, so I guess it's mostly safe. Could be reworked in a follow-up ticket, TBH.

operator/pkg/controller/utils/reconcile_test.go Outdated Show resolved Hide resolved
operator/pkg/controller/virtualippool/controller.go Outdated Show resolved Hide resolved
@TeddyAndrieux TeddyAndrieux force-pushed the feature/move-cp-vips-handling-to-operator branch 2 times, most recently from be84bb1 to bd160fe Compare March 1, 2023 08:08
gdemonet
gdemonet previously approved these changes Mar 1, 2023
Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

Comment on lines +41 to +42
SALT_MASTER=\$(sudo crictl ps --label="io.kubernetes.container.name=salt-master" -q)
sudo crictl exec \$SALT_MASTER salt-run state.sls metalk8s.orchestrate.update-control-plane-ingress-ip saltenv=metalk8s-${{ inputs.metalk8s-version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just for my own information, is there a specific reason why we perform this through crictl instead of a simple kubectl exec ?

Copy link
Contributor

@gdemonet gdemonet Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The orchestrate may restart kube-apiserver 😉

Copy link
Collaborator Author

@TeddyAndrieux TeddyAndrieux Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial reason was because this orchestrate may restart the APIServer (and then break the kubectl command) but since nowadays we actually do the "local" APIServer first we shouldn't have any issue

And rework the sub reconciler logic so that they all run in a dedicated
Goroutine and wait for all of them to complete
From time to time we may have flaky on DynamicClient creation especially
since some tests restart the Kubernetes APIServer, in order to avoid
those flakiness, retry on k8s_client creation
If for whatever reason keepalived Pod crash and is not able to release
the VIP, then the VIP will still be available on the host and when the
Pod will try to start again it will pick the loopback interface instead
of the actual interface, which make keepalived Crash

To avoid this issue ensure the VIP is not yet assigned to one interface
@TeddyAndrieux TeddyAndrieux force-pushed the feature/move-cp-vips-handling-to-operator branch from bd160fe to e3db87b Compare March 1, 2023 10:25
@TeddyAndrieux
Copy link
Collaborator Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Mar 1, 2023

Build failed

The build for commit did not succeed in branch feature/move-cp-vips-handling-to-operator.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Mar 1, 2023

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/125.0

The following branches will NOT be impacted:

  • development/123.0
  • development/124.0
  • development/124.1
  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Mar 1, 2023

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/125.0

The following branches have NOT changed:

  • development/123.0
  • development/124.0
  • development/124.1
  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye teddyandrieux.

@bert-e bert-e merged commit e3db87b into development/125.0 Mar 1, 2023
@bert-e bert-e deleted the feature/move-cp-vips-handling-to-operator branch March 1, 2023 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants