Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Deterministic ordering of lifecycle hooks #847

Merged
merged 2 commits into from
Aug 9, 2024

Conversation

jimmidyson
Copy link
Member

This commit introduces wrapper handlers to allow ordering
of lifecyce hooks to be called via a single registered hook.
This ensures that hooks are called in a defined order, fixing
issues that have arisen that require CNI hook to be applied
before metallb hook.

This commit introduces wrapper handlers to allow ordering
of lifecyce hooks to be called via a single registered hook.
This ensures that hooks are called in a defined order, fixing
issues that have arisen that require CNI hook to be applied
before metallb hook.
@dlipovetsky
Copy link
Contributor

With one handler for a specific hook type running "sub-handlers" sequentially, we'll need to be careful to make sure the handler responds within the CAPI deadline of 30s.

We talked about running "sub-handlers" in parallel, and we can do that in follow-up PR.

@dkoshkin
Copy link
Contributor

dkoshkin commented Aug 8, 2024

On main I see that timeout is 10seconds, am I looking at the correct place?

apiVersion: runtime.cluster.x-k8s.io/v1alpha1
kind: ExtensionConfig
metadata:
  annotations:
    runtime.cluster.x-k8s.io/inject-ca-from-secret: caren-system/cluster-api-runtime-extensions-nutanix-runtimehooks-tls
  creationTimestamp: "2024-08-08T22:21:32Z"
  generation: 2
  labels:
    cluster.x-k8s.io/provider: runtime-extension-caren
    clusterctl.cluster.x-k8s.io: ""
  name: cluster-api-runtime-extensions-nutanix
  resourceVersion: "4431"
  uid: 28e66cd5-b136-4f4b-b818-4c0037e6930f
status:
  - failurePolicy: Fail
    name: serviceloadbalancerhandler-acpi.cluster-api-runtime-extensions-nutanix
    requestHook:
      apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
      hook: AfterControlPlaneInitialized
    timeoutSeconds: 10
...

@dlipovetsky
Copy link
Contributor

On main I see that timeout is 10seconds, am I looking at the correct place?

Good catch. Those 10s timeouts make sense for individual handlers, but we should increase it, perhaps to 30s, for the one aggregate handler.

@dlipovetsky
Copy link
Contributor

On second thought, if the sub-handler fails, no other sub-handlers will run, so shortening the timeout isn't useful. But it will be necessary if/when we execute sub-handlers in parallel.

@dlipovetsky
Copy link
Contributor

I'll reduce the MetalLB timeout to 10s, since the handler is already limited to 10s by CAPI's default. We'll want to raise the handler timeout to 30s, but in a way that fits with the library, and that'll need some thought.

CAPI expects handlers to respond within 10s by default; 30s is only a
configurable maximum.
Copy link
Contributor

@dkoshkin dkoshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified in the logs that the handlers are called in the expected order and the HCPs are also created in the same order:
(Notice the metallb is last)

$ kubectl get hcp -w 
NAME                                             READY   REASON
cilium-cni-installation-dkoshkin-hook-ordering           
nutanix-ccm-dkoshkin-hook-ordering                       
node-feature-discovery-dkoshkin-hook-ordering                              
cluster-autoscaler-dkoshkin-hook-ordering                
nutanix-csi-dkoshkin-hook-ordering                       
snapshot-controller-dkoshkin-hook-ordering               
metallb-dkoshkin-hook-ordering                           

@jimmidyson jimmidyson enabled auto-merge (squash) August 9, 2024 16:27
@dkoshkin dkoshkin force-pushed the jimmi/fix-lifecycle-hook-ordering branch from 0bd6852 to 115f16b Compare August 9, 2024 16:33
@jimmidyson jimmidyson merged commit c75b759 into main Aug 9, 2024
30 checks passed
@jimmidyson jimmidyson deleted the jimmi/fix-lifecycle-hook-ordering branch August 9, 2024 17:04
@faiq faiq mentioned this pull request Aug 13, 2024
jimmidyson pushed a commit that referenced this pull request Aug 13, 2024
🤖 I have created a release *beep* *boop*
---


## 0.13.7 (2024-08-13)

<!-- Release notes generated using configuration in .github/release.yaml
at main -->

## What's Changed
### Fixes 🔧
* fix: Deterministic ordering of lifecycle hooks by @jimmidyson in
#847
* fix: reorder lifecycle handlers with serviceloadbalancer being last by
@dkoshkin in
#848
* fix: update mindthegap by @faiq in
#852
* fix: Handle long cluster names by @jimmidyson in
#845


**Full Changelog**:
v0.13.6...v0.13.7

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants