Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 400 #2222

Open
richiefrich opened this issue Dec 21, 2024 · 6 comments
Open

Error 400 #2222

richiefrich opened this issue Dec 21, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@richiefrich
Copy link

TL;DR

On the initial run of building a GKE cluster it works. then if I rerun the module, it updated the node-pools and errors with "Error 400: At least one of"

Expected behavior

Expect it to just update the node-pool

Observed behavior

module.primary-cluster.google_container_cluster.primary: Modifications complete after 33m56s [id=projects/ss-pp-fleet-test01-p/locations/us-central1/clusters/services-01]
module.primary-cluster.google_container_node_pool.pools["services-node-pool"]: Modifying... [id=projects/ss-pp-fleet-test01-p/locations/us-central1/clusters/services-01/nodePools/services-node-pool-73f9]

Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
Details:
[
{
"@type": "type.googleapis.com/google.rpc.RequestInfo",
"requestId": "0x1d7a40eb9326f795"
}
]
, badRequest

Terraform Configuration

Using Terragrunt but here is my node-pool config

  node_pools = [
    {
      name                             = "services-node-pool"
      autoscaling                      = true
      auto_repair                      = true
      auto_upgrade                     = true
      min_count                        = 1
      max_count                        = 4
      node_count                       = 1
      image_type                       = "COS_CONTAINERD"
      machine_type                     = "n2-standard-8"
      enable_secure_boot               = true
      service_account                  = "services-gke-sa@${local.project_id}.iam.gserviceaccount.com"
      node_pools_create_before_destroy = true
    },
  ]

Terraform Version

$ terraform  -version
Terraform v1.5.7
on linux_amd64

This used to work, I know its an older version of Terraform.

Additional information

No response

@richiefrich richiefrich added the bug Something isn't working label Dec 21, 2024
@hellolin324
Copy link

hellolin324 commented Dec 25, 2024

I am having the same issue, it tries to delete the kubelet_config under the node_config block everytime I run a tf apply, and throws me this error:
googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
│ Details:
│ [
│ {
│ "@type": "type.googleapis.com/google.rpc.RequestInfo",
│ "requestId": "0x1d114922d7dd5a2f"
│ }
│ ]
│ , badRequest

It does not affect any functionality as the gke cluster still gets to be created but I can't make the error go away.

My config is below:

  node_pools = [
    {
      name           = "custom-node-pool"
      machine_type   = "e2-medium"
      min_count      = 1
      max_count      = 3
      disk_size_gb   = 30
      auto_repair    = true
      auto_upgrade   = true
      cpu_manager_policy          = "static"
      cpu_cfs_quota               = false
      cpu_cfs_quota_period        = null
      insecure_kubelet_readonly_port_enabled = null
      pod_pids_limit              = 0
    }
  ]

Hi there, I just resolved the issue above. Make sure you are using the latest and greatest google provider and remote modules. After I explicitly stating my VPC module's version to

 source       = "terraform-google-modules/network/google"
  version = "~> 10.0.0"

and my providers to

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 6.11.0, < 7.0.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = ">= 6.11.0, < 7.0.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.10"
    }
    random = {
      source  = "hashicorp/random"
      version = ">= 2.1.0"
    }
  }
}

The issue was resolved. Turns out I was using out of date providers and remote modules (Was a few years behind), and now when I run tf plan and apply the bad request 400 error no longer displays and everything is fine now.

@richiefrich
Copy link
Author

I would still like to know what changed, as even using older versions have worked up until maybe 2 weeks ago?
Has Google updated the older versions of the resources?

@hellolin324
Copy link

I have no idea what had changed, but before it was throwing me the same error on a different tag that belonged in another map, and I was not able to fit it into a list of objects due to tf's constraints. Updating the Google provider worked for me as that allowed me to use the latest GKE module here, and that fixed it. Tested with using different flags and they all worked fine, revert back to old provider and error was back.

@rgddos
Copy link

rgddos commented Dec 30, 2024

I would still like to know what changed, as even using older versions have worked up until maybe 2 weeks ago? Has Google updated the older versions of the resources?

We encountered same issue and our CI pipelines were continuously failing. This started about two weeks ago, although everything had been working fine before that. We were using an older provider (3.x.x), which made it challenging to upgrade the provider

To address this, we implemented the following hardcoded configuration in our module:

kubelet_config {
  cpu_manager_policy   = "static"
  cpu_cfs_quota        = false
}

@hellolin324
Copy link

kubelet_config {
  cpu_manager_policy   = "static"
  cpu_cfs_quota        = false
}

This is what I tried but couldn't work, so instead I forked this repo and added an additional line in the lifecycle{} block of the node_pool resource in the cluster.tf file, which also worked with the older provider.

Can you share your entire block of tf code for the config please?

@KatrinaHoffert
Copy link

A drive by comment that this was probably what GoogleCloudPlatform/magic-modules#11971 fixed. Looks like version 6.7.0 probably is the first that fixed it. In short, new server-side computed values are set and since the TF provider is too old to know about them, it thinks it should remove them and constructs an invalid request to do so (the request ends up looking as if it were updating nothing, hence the error).

I didn't test it, but suspect ignore_changes can be used as a temporary workaround when updating the provider isn't an option, as it would tell TF to ignore that the field has "changed" and thus prevent it from making the invalid request to update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants