Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing multiple fields of cluster while update is not working #667

Open
cpinjani opened this issue Sep 16, 2024 · 3 comments
Open

Changing multiple fields of cluster while update is not working #667

cpinjani opened this issue Sep 16, 2024 · 3 comments
Labels
kind/bug Something isn't working kind/regression team/infracloud Issues for infracloud team
Milestone

Comments

@cpinjani
Copy link
Contributor

Rancher version:
Rancher - v2.9-f1b43d2568d7c53c3adf45d9ffd74a04ea65fc22-head
aks-operator:v1.9.2-rc.2

Cluster Type: Downstream AKS cluster

Describe the bug:

  • Provision cluster with multiple Nodepools and wait for it to be Active
  • Edit the cluster: delete any Nodepool and enable HTTP Application routing, click Save.

Spec is updated with changes applied.
Only the change to routing gets applied, and Nodepool remains intact after the update.

aksConfig:
    authBaseUrl: https://login.microsoftonline.com/
    authorizedIpRanges: null
    azureCredentialSecret: cattle-global-data:cc-j7sg7
    baseUrl: https://management.azure.com/
    clusterName: cpinjani-aks
    dnsPrefix: cpinjani-aks
    dnsServiceIp: 10.0.0.10
    dockerBridgeCidr: null
    httpApplicationRouting: true
    imported: false
    kubernetesVersion: 1.29.0
    linuxAdminUsername: azureuser
    loadBalancerSku: Standard
    logAnalyticsWorkspaceGroup: null
    logAnalyticsWorkspaceName: null
    managedIdentity: null
    monitoring: null
    networkPlugin: kubenet
    networkPolicy: null
    nodePools:
      - availabilityZones:
          - '1'
          - '2'
          - '3'
        count: 1
        maxPods: 110
        maxSurge: '1'
        mode: System
        name: np1
        orchestratorVersion: 1.29.0
        osDiskSizeGB: 128
        osDiskType: Managed
        osType: Linux
        vmSize: Standard_DS2_v2
    outboundType: loadBalancer
    podCidr: 10.244.0.0/16
    privateCluster: false
    privateDnsZone: null
    resourceGroup: cpinjani-aks
    resourceLocation: eastus
    serviceCidr: 10.0.0.0/16
    subnet: null
    tags:
      Account Type: group
    userAssignedIdentity: null
    virtualNetwork: null
    virtualNetworkResourceGroup: null

Logs:

2.9:

time="2024-09-16T11:03:50Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:03:50Z" level=info msg="Updating HTTP application routing to true for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:27Z" level=error msg="Error recording akscc [ (id: )] failure message: resource name may not be empty"
time="2024-09-16T11:06:28Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:28Z" level=info msg="Configuration for cluster [cpinjani-aks (id: c-jm2vn)] was verified"
time="2024-09-16T11:06:29Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:30Z" level=info msg="Configuration for cluster [cpinjani-aks (id: c-jm2vn)] was verified"

2.8:

time="2024-09-16T10:42:39Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:42:40Z" level=info msg="Updating HTTP application routing for cluster [cpinjani-aks28]"
time="2024-09-16T10:42:45Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:43:15Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:43:46Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:44:17Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:44:48Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:44:49Z" level=info msg="Removing node pool [pool3] from cluster [cpinjani-aks28]"
time="2024-09-16T10:44:51Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:45:22Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:45:52Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:46:23Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:24Z" level=info msg="Cluster [c-zw6hc] finished updating"
time="2024-09-16T10:46:25Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:25Z" level=info msg="Configuration for cluster [cpinjani-aks28] was verified"
time="2024-09-16T10:46:40Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:41Z" level=info msg="Configuration for cluster [cpinjani-aks28] was verified"
@cpinjani cpinjani added kind/bug Something isn't working kind/regression labels Sep 16, 2024
@cpinjani cpinjani added this to the v2.9-Next1 milestone Sep 16, 2024
@cpinjani cpinjani moved this to Backlog in CAPI / Turtles Sep 16, 2024
@vatsalparekh vatsalparekh self-assigned this Sep 19, 2024
@valaparthvi
Copy link
Contributor

valaparthvi commented Sep 23, 2024

On the same lines, it is not possible to add more than one nodepool to the cluster via API. One of the test cases is to update a cluster while it is still provisioning.
I tested with 2 updates:

  1. K8s upgrade and node pool addition - it only upgraded k8s version
  2. Adding more than one node pool - it only added one nodepool

When I tested the same things via UI, it seemed to be working as expected.

AKSConfig is updated when the request is first sent, but then it seems to revert.

Logs

e="2024-09-23T13:52:39Z" level=info msg="Checking if cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] exists"
time="2024-09-23T13:52:40Z" level=info msg="Checking if resource group [auto-aks-pvala-hp-ci-cqsni] exists"
time="2024-09-23T13:52:40Z" level=info msg="Creating resource group [auto-aks-pvala-hp-ci-cqsni] for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:52:41Z" level=info msg="Resource group [auto-aks-pvala-hp-ci-cqsni] created successfully"
time="2024-09-23T13:52:41Z" level=info msg="Creating AKS cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:52:49Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:53:13Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:53:19Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:53:20Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:53:50Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:54:22Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:54:54Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:55:25Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:55:56Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating"
time="2024-09-23T13:56:27Z" level=info msg="Cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] created successfully"
time="2024-09-23T13:56:30Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:56:31Z" level=info msg="Updating tags for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:56:33Z" level=info msg="Tags were not updated for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)], config map[janitor-ignore:true owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], upstream map[Account Owner:[email protected] Account Type:group Cost Center:211799999 Department:ecm Environment:development Finance Business Partner:[email protected] General Ledger Code:200000119 Stakeholder:[email protected] Team:container-es janitor-ignore:true owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], moving on"
time="2024-09-23T13:56:33Z" level=info msg="Updating kubernetes version to 1.30.3 for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:59:11Z" level=error msg="Error recording akscc [ (id: )] failure message: resource name may not be empty"
time="2024-09-23T13:59:12Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:59:13Z" level=info msg="Configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was verified"
time="2024-09-23T13:59:15Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T13:59:16Z" level=info msg="Configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was verified"
time="2024-09-23T13:59:24Z" level=info msg="Removing cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]"
time="2024-09-23T14:01:18Z" level=info msg="Creating cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:01:18Z" level=info msg="Checking if cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] exists"
time="2024-09-23T14:01:19Z" level=info msg="Checking if resource group [auto-aks-hp-ci-vgcla] exists"
time="2024-09-23T14:01:19Z" level=info msg="Creating resource group [auto-aks-hp-ci-vgcla] for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:01:20Z" level=info msg="Resource group [auto-aks-hp-ci-vgcla] created successfully"
time="2024-09-23T14:01:20Z" level=info msg="Creating AKS cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:01:32Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:01:44Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:02:03Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:02:34Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:02:57Z" level=info msg="Cluster auto-aks-pvala-hp-ci-cqsni removed successfully"
time="2024-09-23T14:02:57Z" level=info msg="Cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was removed successfully"
time="2024-09-23T14:02:57Z" level=info msg="Resource group [auto-aks-pvala-hp-ci-cqsni] for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] still exists, please remove it if needed"
time="2024-09-23T14:03:06Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:03:37Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:04:08Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating"
time="2024-09-23T14:04:40Z" level=info msg="Cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] created successfully"
time="2024-09-23T14:04:43Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:04:44Z" level=info msg="Updating tags for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:04:47Z" level=info msg="Tags were not updated for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)], config map[owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], upstream map[Account Owner:[email protected] Account Type:group Cost Center:211799999 Department:ecm Environment:development Finance Business Partner:[email protected] General Ledger Code:200000119 Stakeholder:[email protected] Team:container-es owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], moving on"
time="2024-09-23T14:04:47Z" level=info msg="Adding node pool [ggygl] for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:04:52Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]"
time="2024-09-23T14:05:23Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]"
time="2024-09-23T14:05:55Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]"
time="2024-09-23T14:06:26Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]"
time="2024-09-23T14:06:39Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]"
time="2024-09-23T14:06:57Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:06:59Z" level=info msg="Cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] finished updating"
time="2024-09-23T14:07:00Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"
time="2024-09-23T14:07:01Z" level=info msg="Configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] was verified"
time="2024-09-23T14:07:50Z" level=info msg="Removing cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]"

@valaparthvi
Copy link
Contributor

valaparthvi commented Sep 24, 2024

This was also seen while deleting a nodepool and adding a new one. While the new nodepool remained, the deleted nodepool was re-added after a few minutes.
AKSConfig maintains the desired state, until it is restored.

@mjura mjura modified the milestones: v2.9.3, v2.9.4 Oct 17, 2024
@mjura mjura self-assigned this Oct 18, 2024
@mjura mjura removed their assignment Oct 25, 2024
@kkaempf kkaempf modified the milestones: v2.9.4, v2.10.1 Nov 5, 2024
@kkaempf kkaempf modified the milestones: v2.10.1, v2.10.2 Dec 10, 2024
@mjura mjura moved this from PR to be reviewed to Backlog in CAPI / Turtles Dec 24, 2024
@kkaempf kkaempf modified the milestones: v2.10.2, v2.10.3 Jan 14, 2025
@mitulshah-suse mitulshah-suse added the team/infracloud Issues for infracloud team label Jan 17, 2025
@krunalhinguu
Copy link
Contributor

krunalhinguu commented Jan 28, 2025

@cpinjani After attempting to reproduce the reported problem in the AKS cluster management workflow (v2.9), the scenario executed as expected, with cluster updates functioning correctly. Coordination with @valaparthvi confirmed no inconsistencies in the observed behavior.

The recorded error: time="2024-09-16T11:06:27Z" level=error msg="Error recording akscc [ (id: )] failure message: resource name may not be empty" was traced to a separate code path unrelated to the cluster update mechanism. Root cause analysis identified an edge case in resource metadata validation logic, where empty resource identifiers were improperly handled. A fix is currently in development and will be resolved in an upcoming patch.

@kkaempf kkaempf removed this from CAPI / Turtles Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/regression team/infracloud Issues for infracloud team
Projects
None yet
Development

No branches or pull requests

7 participants