Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kamaji infinity loop for certificate update. #679

Open
kvaps opened this issue Jan 29, 2025 · 6 comments
Open

Kamaji infinity loop for certificate update. #679

kvaps opened this issue Jan 29, 2025 · 6 comments
Assignees

Comments

@kvaps
Copy link
Contributor

kvaps commented Jan 29, 2025

Hey I use kamaji built from edge-24.12.1 tag, I can see that my tenant clusters are continiously updating certificates

-        Serial Number: 7728722313002611879 (0x6b41effec227f4a7)
+        Serial Number: 4233493379433580981 (0x3ac063cc701b49b5)
         Signature Algorithm: sha256WithRSAEncryption
         Issuer: CN=kubernetes
         Validity
-            Not Before: Jan 29 07:07:58 2025 GMT
-            Not After : Jan 29 07:12:58 2026 GMT
+            Not Before: Jan 29 08:36:06 2025 GMT
+            Not After : Jan 29 08:41:06 2026 GMT

Control-plane pods are also continiosly restarting every n seconds:

-        component.kamaji.clastix.io/api-server-certificate: cdee4ee8fed7a96db92ce1509eefe8ff
+        component.kamaji.clastix.io/api-server-certificate: 2445771c621cbe146dea9cf0b2b0c69a
kvaps added a commit to aenix-io/cozystack that referenced this issue Jan 29, 2025
Due to upstream issue: clastix/kamaji#679

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
	- Updated Kamaji application version to v1.0.0
	- Modified dependency version constraints for kamaji-etcd

- **Documentation**
	- Updated README with new version information
- Clarified configuration descriptions for DataStore and network
profiles

- **Chores**
	- Updated Chart version to 2.0.0
	- Simplified configuration management in deployment templates
	- Updated Dockerfile to use a different source code version

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <[email protected]>
@prometherion
Copy link
Member

Thanks for the report, Andrei.

I need help in replicating this since I'm unable:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  annotations:
  creationTimestamp: "2025-01-29T10:11:57Z"
  finalizers:
  - finalizer.kamaji.clastix.io
  - finalizer.kamaji.clastix.io/soot
  generation: 2
  labels:
    tenant.clastix.io: k8s-130
  name: k8s-130
  namespace: default
  resourceVersion: "3768131"
  uid: e5b6cfae-9414-436c-ab6e-631ecf0677f7
spec:
  addons:
    coreDNS: {}
    konnectivity:
      agent:
        image: registry.k8s.io/kas-network-proxy/proxy-agent
        tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
        version: v0.28.6
      server:
        image: registry.k8s.io/kas-network-proxy/proxy-server
        port: 8132
        version: v0.28.6
    kubeProxy: {}
  controlPlane:
    deployment:
      additionalMetadata: {}
      podAdditionalMetadata: {}
      registrySettings:
        apiServerImage: kube-apiserver
        controllerManagerImage: kube-controller-manager
        registry: registry.k8s.io
        schedulerImage: kube-scheduler
      replicas: 2
      serviceAccountName: default
      strategy:
        rollingUpdate:
          maxSurge: 100%
          maxUnavailable: 0
        type: RollingUpdate
    service:
      additionalMetadata: {}
      serviceType: LoadBalancer
  dataStore: default
  dataStoreSchema: default_k8s_130
  kubernetes:
    admissionControllers:
    - CertificateApproval
    - CertificateSigning
    - CertificateSubjectRestriction
    - DefaultIngressClass
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - LimitRanger
    - MutatingAdmissionWebhook
    - NamespaceLifecycle
    - PersistentVolumeClaimResize
    - Priority
    - ResourceQuota
    - RuntimeClass
    - ServiceAccount
    - StorageObjectInUseProtection
    - TaintNodesByCondition
    - ValidatingAdmissionWebhook
    kubelet:
      cgroupfs: systemd
      preferredAddressTypes:
      - Hostname
      - InternalIP
      - ExternalIP
    version: v1.30.0
  networkProfile:
    clusterDomain: cluster.local
    dnsServiceIPs:
    - 10.96.0.10
    podCidr: 10.244.0.0/16
    port: 6443
    serviceCidr: 10.96.0.0/16

The Secret objects:

$: kubectl get secret -w
NAME                                            TYPE                 DATA   AGE
k8s-130-admin-kubeconfig                        Opaque               4      6s
k8s-130-api-server-certificate                  Opaque               2      9s
k8s-130-api-server-kubelet-client-certificate   Opaque               2      9s
k8s-130-ca                                      Opaque               4      12s
k8s-130-controller-manager-kubeconfig           Opaque               1      3s
k8s-130-datastore-certificate                   Opaque               3      0s
k8s-130-datastore-config                        Opaque               4      1s
k8s-130-front-proxy-ca-certificate              Opaque               2      11s
k8s-130-front-proxy-client-certificate          Opaque               2      7s
k8s-130-sa-certificate                          Opaque               2      10s
k8s-130-scheduler-kubeconfig                    Opaque               1      2s
root-ca                                         Opaque               1      74d
sh.helm.release.v1.my-release.v1                helm.sh/release.v1   1      55d
k8s-130-konnectivity-certificate                kubernetes.io/tls    2      0s
k8s-130-konnectivity-kubeconfig                 Opaque               1      0s

@prometherion
Copy link
Member

Control-plane pods are also continiosly restarting every n seconds

May I ask you to share the exact n seconds? Could it be related to something related to the underlying NTP Service?

@prometherion prometherion self-assigned this Jan 29, 2025
@kvaps
Copy link
Contributor Author

kvaps commented Jan 29, 2025

From the replicasets creation timestamp I can say that it was hapening around every 2-3m.

Here is an example of my tcp:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  annotations:
    kamaji.clastix.io/kubeconfig-secret-key: super-admin.svc
  creationTimestamp: "2025-01-15T13:17:26Z"
  finalizers:
  - finalizer.kamaji.clastix.io
  - finalizer.kamaji.clastix.io/soot
  generation: 1
  name: kubernetes-test2
  namespace: tenant-test
  ownerReferences:
  - apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KamajiControlPlane
    name: kubernetes-test2
    uid: b86e47c1-a4f8-4fa1-8b95-a891f0b18310
  resourceVersion: "141721429"
  uid: f0dafc90-4d0e-424d-b3ae-c7c4bee980b6
spec:
  addons:
    coreDNS: {}
    konnectivity:
      agent:
        image: registry.k8s.io/kas-network-proxy/proxy-agent
        tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
        version: v0.0.32
      server:
        image: registry.k8s.io/kas-network-proxy/proxy-server
        port: 8132
        version: v0.0.32
  controlPlane:
    deployment:
      additionalMetadata: {}
      additionalVolumeMounts: {}
      extraArgs: {}
      podAdditionalMetadata:
        labels:
          policy.cozystack.io/allow-to-etcd: "true"
      registrySettings:
        apiServerImage: kube-apiserver
        controllerManagerImage: kube-controller-manager
        registry: registry.k8s.io
        schedulerImage: kube-scheduler
      replicas: 2
      resources:
        apiServer: {}
        controllerManager: {}
        kine: {}
        scheduler: {}
      serviceAccountName: default
      strategy: {}
    ingress:
      additionalMetadata:
        annotations:
          nginx.ingress.kubernetes.io/ssl-passthrough: "true"
      hostname: kube2.test.example.dev:443
      ingressClassName: tenant-test
    service:
      additionalMetadata: {}
      serviceType: ClusterIP
  dataStore: tenant-test
  kubernetes:
    admissionControllers:
    - CertificateApproval
    - CertificateSigning
    - CertificateSubjectRestriction
    - DefaultIngressClass
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - LimitRanger
    - MutatingAdmissionWebhook
    - NamespaceLifecycle
    - PersistentVolumeClaimResize
    - Priority
    - ResourceQuota
    - RuntimeClass
    - ServiceAccount
    - StorageObjectInUseProtection
    - TaintNodesByCondition
    - ValidatingAdmissionWebhook
    kubelet:
      cgroupfs: systemd
      preferredAddressTypes:
      - InternalIP
      - ExternalIP
    version: v1.30.1
  networkProfile:
    certSANs:
    - kube2.test.example.dev:443
    clusterDomain: cluster.local
    dnsServiceIPs:
    - 10.95.0.10
    podCidr: 10.243.0.0/16
    port: 6443
    serviceCidr: 10.95.0.0/16
status:
  addons:
    coreDNS:
      enabled: true
      lastUpdate: "2025-01-15T13:18:10Z"
    konnectivity:
      agent:
        lastUpdate: "2025-01-15T13:18:10Z"
        name: konnectivity-agent
        namespace: kube-system
      certificate:
        checksum: e728fefa2dfceefd8c773a6a4241506c
        lastUpdate: "2025-01-15T13:17:44Z"
        secretName: kubernetes-test2-konnectivity-certificate
      clusterrolebinding:
        name: system:konnectivity-server
      configMap:
        checksum: e8b426b27a2452d0137b14cfbbd10a0d
        name: kubernetes-test2-konnectivity-egress-selector-configuration
      enabled: true
      kubeconfig:
        checksum: 4f3ed6de207cbf463476667d32581927
        lastUpdate: "2025-01-15T13:17:45Z"
        secretName: kubernetes-test2-konnectivity-kubeconfig
      sa:
        name: konnectivity-agent
        namespace: kube-system
      service:
        loadBalancer: {}
        name: kubernetes-test2
        namespace: tenant-test
        port: 8132
    kubeProxy:
      enabled: false
  certificates:
    apiServer:
      checksum: 5e1a23517fe96af2721608e3c5165548
      lastUpdate: "2025-01-29T08:52:37Z"
      secretName: kubernetes-test2-api-server-certificate
    apiServerKubeletClient:
      checksum: ac60c441b5269b178353987f7984d932
      lastUpdate: "2025-01-15T13:17:32Z"
      secretName: kubernetes-test2-api-server-kubelet-client-certificate
    ca:
      checksum: 92e5c3341c10e5459ff3d1300187e105
      lastUpdate: "2025-01-15T13:17:27Z"
      secretName: kubernetes-test2-ca
    frontProxyCA:
      checksum: 5efc9e11c310b886db8214df467842e5
      lastUpdate: "2025-01-15T13:17:29Z"
      secretName: kubernetes-test2-front-proxy-ca-certificate
    frontProxyClient:
      checksum: 12ddeaa1fc5d77e4679deb1a4bc5002a
      lastUpdate: "2025-01-15T13:17:36Z"
      secretName: kubernetes-test2-front-proxy-client-certificate
    sa:
      checksum: f85d1d302f06d01527516798b2247b3d
      lastUpdate: "2025-01-15T13:17:30Z"
      secretName: kubernetes-test2-sa-certificate
  controlPlaneEndpoint: 10.96.3.124:6443
  kubeadmPhase:
    bootstrapToken:
      checksum: d0983869e781807f38e22ae253a19348
      lastUpdate: "2025-01-29T08:53:46Z"
  kubeadmconfig:
    checksum: 516a16c9573821a1a53685c0b454dd72
    configmapName: kubernetes-test2-kubeadmconfig
    lastUpdate: "2025-01-15T13:17:27Z"
  kubeconfig:
    admin:
      checksum: 53bfbbc425609ebd7d9ba858639b802b
      lastUpdate: "2025-01-29T08:53:42Z"
      secretName: kubernetes-test2-admin-kubeconfig
    controllerManager:
      checksum: 7b510493ecda32340a9041dc95f1e732
      lastUpdate: "2025-01-29T08:53:49Z"
      secretName: kubernetes-test2-controller-manager-kubeconfig
    scheduler:
      checksum: 7b510493ecda32340a9041dc95f1e732
      lastUpdate: "2025-01-29T08:55:12Z"
      secretName: kubernetes-test2-scheduler-kubeconfig
  kubernetesResources:
    deployment:
      availableReplicas: 2
      conditions:
      - lastTransitionTime: "2025-01-22T11:01:54Z"
        lastUpdateTime: "2025-01-22T11:01:54Z"
        message: Deployment has minimum availability.
        reason: MinimumReplicasAvailable
        status: "True"
        type: Available
      - lastTransitionTime: "2025-01-15T13:17:45Z"
        lastUpdateTime: "2025-01-29T08:55:23Z"
        message: ReplicaSet "kubernetes-test2-b55fcf86c" has successfully progressed.
        reason: NewReplicaSetAvailable
        status: "True"
        type: Progressing
      lastUpdate: "2025-01-29T08:55:25Z"
      name: kubernetes-test2
      namespace: tenant-test
      observedGeneration: 669
      readyReplicas: 2
      replicas: 2
      selector: kamaji.clastix.io/name=kubernetes-test2
      updatedReplicas: 2
    ingress:
      loadBalancer:
        ingress:
        - ip: 11.222.333.44
      name: kubernetes-test2
      namespace: tenant-test
    service:
      loadBalancer: {}
      name: kubernetes-test2
      namespace: tenant-test
      port: 6443
    version:
      status: Ready
      version: v1.30.1
  storage:
    certificate:
      checksum: c2ef3e48690b4d76029c40e06fb75853
      lastUpdate: "2025-01-28T12:00:01Z"
      secretName: kubernetes-test2-datastore-certificate
    config:
      checksum: 326d187d6b4164f179d1c377c7fea321
      secretName: kubernetes-test2-datastore-config
    dataStoreName: tenant-test
    driver: etcd
    setup:
      checksum: 326d187d6b4164f179d1c377c7fea321
      lastUpdate: "2025-01-15T13:17:42Z"
      schema: tenant_test_kubernetes_test2
      user: tenant_test_kubernetes_test2

@prometherion
Copy link
Member

It's something I'm not able to replicate:

$: kubectl get pods -w
NAME                       READY   STATUS    RESTARTS   AGE
k8s-130-867485f6d7-8md4c   4/4     Running   0          15m
k8s-130-867485f6d7-gmzd6   4/4     Running   0          15m
$: kubectl get pods -w
NAME                       READY   STATUS    RESTARTS   AGE
k8s-130-867485f6d7-8md4c   4/4     Running   0          55m
k8s-130-867485f6d7-gmzd6   4/4     Running   0          55m

@prometherion
Copy link
Member

@kvaps out of the blue, could it be related to the Cluster API Control Plane provider, or any other CAPI component?

@kvaps
Copy link
Contributor Author

kvaps commented Jan 30, 2025

Not sure, I'll try to reproduce this on the next week, please keep this open yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants