[Bug]: aws_rds_cluster does not wait properly for modifications #28339

jcarlson · 2022-12-14T00:19:21Z

Terraform Core Version

1.3.6

AWS Provider Version

4.42.0

Affected Resource(s)

aws_rds_cluster

Expected Behavior

Terraform should wait for pending changes to be applied asynchronously

Actual Behavior

Terraform is only monitoring the <Status> attribute of the DescribeDBClusters operation.

Some changes, for example changing the database engine major version, AWS applies those changes asynchronously. When Terraform applies the change and then begins polling for the cluster status to be "available", it finds that the status is immediately available because AWS does not apply the changes right away.

While the cluster is still in the available state, the pending changes can be found in another attribute, PendingModifiedValues. Terraform should wait until both Status=available and PendingModifiedValues=[].

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

terraform {
  required_version = ">= 1.3"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "4.42.0"
    }
  }
}

variable "name" {
  type    = string
  default = "issue-28339"
}

variable "postgres_version" {
  type    = string
}

variable "subnet_ids" {
  type = list(string)
}

variable "tags" {
  type = map(string)
  default = {
    repository = "https://github.com/hashicorp/terraform-provider-aws"
    issue      = "https://github.com/hashicorp/terraform-provider-aws/issues/28339"
  }
}

variable "vpc_id" {
  type = string
}

locals {
  engine         = "aurora-postgresql"
  engine_version = var.postgres_version
  family         = "aurora-postgresql${local.major_version}"
  major_version  = split(".", local.engine_version)[0]
}

resource "random_pet" "username" {
  separator = "_"
}

resource "random_password" "password" {
  length  = 24
  special = false
}

resource "random_pet" "database" {
  separator = "_"
}

resource "aws_db_subnet_group" "main" {
  name       = var.name
  subnet_ids = var.subnet_ids
  tags       = var.tags

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [name]
  }
}

resource "aws_security_group" "main" {
  description = "Database security group"
  name_prefix = "${var.name}-"
  tags        = var.tags
  vpc_id      = var.vpc_id

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [description, name_prefix]
  }
}

resource "aws_db_parameter_group" "main" {
  name_prefix = "database-parameter-group-"
  family      = local.family
  description = "Database parameter group for ${var.name}"
  tags        = var.tags

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [name_prefix, description]
  }
}

resource "aws_rds_cluster_parameter_group" "main" {
  name_prefix = "database-cluster-parameter-group-"
  family      = local.family
  description = "Database cluster parameter group for ${var.name}"
  tags        = var.tags

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [name_prefix, description]
  }
}

resource "aws_rds_cluster" "main" {
  allow_major_version_upgrade      = true
  apply_immediately                = true
  backup_retention_period          = 1
  cluster_identifier               = "${var.name}-cluster"
  copy_tags_to_snapshot            = true
  database_name                    = random_pet.database.id
  db_cluster_parameter_group_name  = aws_rds_cluster_parameter_group.main.name
  db_instance_parameter_group_name = aws_db_parameter_group.main.name
  db_subnet_group_name             = aws_db_subnet_group.main.name
  deletion_protection              = false
  engine                           = local.engine
  engine_version                   = local.engine_version
  master_password                  = random_password.password.result
  master_username                  = random_pet.username.id
  skip_final_snapshot              = true
  tags                             = var.tags
  vpc_security_group_ids           = [aws_security_group.main.id]
}

resource "aws_rds_cluster_instance" "cluster_instances" {
  count = 2

  apply_immediately          = true
  auto_minor_version_upgrade = false
  cluster_identifier         = aws_rds_cluster.main.id
  db_parameter_group_name    = aws_db_parameter_group.main.name
  db_subnet_group_name       = aws_db_subnet_group.main.name
  engine                     = aws_rds_cluster.main.engine
  engine_version             = aws_rds_cluster.main.engine_version
  identifier                 = "${var.name}-${count.index}"
  instance_class             = "db.t3.medium"
  publicly_accessible        = false
  tags                       = var.tags

  lifecycle {
    ignore_changes = [engine_version]
  }
}

Steps to Reproduce

Using the provide configuration, create a Postgres database cluster
Modify the postgres_version variable to a newer major version, such as 14.4 and run terraform apply
Repeat until the upgrade fails, due to the database cluster not immediately entering an "upgrading" status

Debug Output

See attached
terraform-apply.1671040201.log

Panic Output

No response

Important Factoids

The function waitDBClusterUpdated may want to consider checking for more than just the cluster Status.

Immediately after initiating a major version upgrade via a call to ModifyDBCluster, the cluster continues to return a status of "available", but also indicates pending modified values:

<PendingModifiedValues>
  <EngineVersion>14.4</EngineVersion>
</PendingModifiedValues>

It would appear that the call to ModifyDBCluster returns immediately, but the changes are applied by AWS asynchronously, which is misleading the AWS provider into thinking modifications are complete when they are not.

I have seen this behavior intermittently; sometimes the major version upgrade succeeds using Terraform, and sometimes I run into an error.

Because Terraform incorrectly thinks that modifications on the cluster have completed, it proceeds to ModifyDBInstance to set a new parameter group, and that fails with the following error:

2022-12-13T14:35:32.310-0500 [DEBUG] plugin.terraform-provider-aws_v4.42.0_x5: [DEBUG] [aws-sdk-go] DEBUG: Validate Response rds/ModifyDBInstance failed, attempt 0/25, error InvalidParameterCombination: The parameter group my-database-parameter-group with DBParameterGroupFamily aurora-postgresql14 can't be used for this instance. Use a parameter group with DBParameterGroupFamily aurora-postgresql10.
2022-12-13T14:35:32.310-0500 [DEBUG] plugin.terraform-provider-aws_v4.42.0_x5: 	status code: 400, request id: db7ad778-cae2-4316-89e8-8c8137574b9a
2022-12-13T14:35:32.311-0500 [ERROR] plugin.terraform-provider-aws_v4.42.0_x5: Response contains error diagnostic: @module=sdk.proto tf_req_id=2c900edf-87bf-10ee-0d07-cc3a89ad77a1 tf_rpc=ApplyResourceChange tf_proto_version=5.3 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_resource_type=aws_rds_cluster_instance @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:55 diagnostic_detail= diagnostic_severity=ERROR diagnostic_summary="updating RDS Cluster Instance (my-instance): InvalidParameterCombination: The parameter group my-database-parameter-group with DBParameterGroupFamily aurora-postgresql14 can't be used for this instance. Use a parameter group with DBParameterGroupFamily aurora-postgresql10.

References

No response

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2022-12-14T00:19:32Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

jcarlson · 2022-12-14T00:46:33Z

AWS confirmed in a support chat that the major version upgrade occurs asynchronously

jcarlson · 2022-12-14T16:34:31Z

I have updated the issue with a sample configuration. I am working on getting a debug output, but because the issue is intermittent, this is taking more time.

jcarlson · 2022-12-15T16:13:08Z

I was able to reproduce this issue and have attached a debug log output.

One thing I have noticed, anecdotally, is that this issue appears to be more readily reproducible when the database cluster requires an intermediate upgrade from its current version before upgrading to the target version. For example, Aurora Postgresql 10.20 cannot be upgraded directly to 14.4; in order to complete this upgrade, you must first upgrade to 10.21, 11.16, or 13.6. Since 10.21 is a minor upgrade, that is the route I go.

So to replicate this scenario, I created an Aurora Postgres 10.20 cluster, then upgraded it to 10.21, then upgraded it to 14.4.

jcarlson · 2022-12-15T16:15:03Z

In the attached log file, skip to line 2803, where the AWS provider begins polling for the status of the database cluster to become "available". You can see in the response that the database cluster is already "available" because the upgrade has not yet started on the backend, and so the provider assumes it is "ready" and moves on to modify the database instances, which is where the failure occurs.

jcarlson · 2022-12-16T21:52:58Z

I attempted to work around this issue by adding a null_resource between the aws_rds_cluster and the aws_rds_cluster_instance resources that would use a local-exec provisioner to query the RDS API and determine the true status of the cluster during an upgrade:

resource "aws_rds_cluster" "main" {
  allow_major_version_upgrade      = true
  apply_immediately                = true
  backup_retention_period          = 1
  cluster_identifier               = "${var.name}-cluster"
  copy_tags_to_snapshot            = true
  database_name                    = random_pet.database.id
  db_cluster_parameter_group_name  = aws_rds_cluster_parameter_group.main.name
  db_instance_parameter_group_name = aws_db_parameter_group.main.name
  db_subnet_group_name             = aws_db_subnet_group.main.name
  deletion_protection              = false
  engine                           = local.engine
  engine_version                   = local.engine_version
  master_password                  = random_password.password.result
  master_username                  = random_pet.username.id
  skip_final_snapshot              = true
  tags                             = var.tags
  vpc_security_group_ids           = [aws_security_group.main.id]
}

# https://github.com/hashicorp/terraform-provider-aws/issues/28339
resource "null_resource" "aws_provider_bug" {
  triggers = {
    db_parameter_group_name = aws_db_parameter_group.main.name
    engine_version          = aws_rds_cluster.main.engine_version
  }

  provisioner "local-exec" {
    command = "${path.module}/wait-for-db-cluster.sh"

    environment = {
      CLUSTER_IDENTIFIER = aws_rds_cluster.main.cluster_identifier
    }
  }
}

resource "aws_rds_cluster_instance" "cluster_instances" {
  count      = 2
  depends_on = [null_resource.aws_provider_bug]

  apply_immediately          = true
  auto_minor_version_upgrade = false
  cluster_identifier         = aws_rds_cluster.main.id
  db_parameter_group_name    = aws_db_parameter_group.main.name
  db_subnet_group_name       = aws_db_subnet_group.main.name
  engine                     = aws_rds_cluster.main.engine
  engine_version             = aws_rds_cluster.main.engine_version
  identifier                 = "${var.name}-${count.index}"
  instance_class             = "db.t3.medium"
  publicly_accessible        = false
  tags                       = var.tags

  lifecycle {
    ignore_changes = [engine_version]
  }
}

#!/usr/bin/env bash

set -eo pipefail

function isDbClusterAvailableWithNoPendingModifiedValues() {
  local dbCluster

  dbCluster="$(
    aws rds describe-db-clusters \
      --db-cluster-identifier "$CLUSTER_IDENTIFIER"
  )"

  jq -e '.DBClusters[0] | .Status == "available" and .PendingModifiedValues == null' > /dev/null <<< "$dbCluster"
}

printf 'Waiting for database cluster status to be "available" with no pending modified values\n'

while true; do
  isDbClusterAvailableWithNoPendingModifiedValues && break
  printf 'Database cluster is not ready yet\n'
  sleep 10
done

But this too failed occasionally with the following error:

│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for null_resource.aws_provider_bug to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/null" produced an invalid new value
│ for .triggers["engine_version"]: was cty.StringVal("14.4"), but now cty.StringVal("10.21").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

Attached is a log from the operation.
terraform-apply.1671226841.log

nfantone · 2023-10-18T13:23:32Z

Faced the same today using provider hashicorp/aws v5.21.0 and terraform v1.6.1. Couldn't not start a major version upgrade on an Aurora PostgreSQL RDS cluster (11.18 -> 14.6).

│ Error: updating RDS Cluster Instance (xxx-db-instance-0): InvalidParameterCombination: The parameter group default.aurora-postgresql14 with DBParameterGroupFamily aurora-postgresql14 can't be used for this instance. Use a parameter group with DBParameterGroupFamily aurora-postgresql11.

@ewbankkit Have there been any movements here? Any known workarounds?

niels1voo · 2023-10-18T17:49:47Z

Same here with v5.11.0.
Related: #30247

niels1voo · 2023-10-18T20:21:36Z

@ewbankkit Have there been any movements here? Any known workarounds?
In my case, I had to reboot the cluster instances, then ran apply again and the upgrade succeeded.

nfantone · 2023-10-18T21:09:00Z

@niels1voo Interesting. When exactly did you try rebooting? Doesn't seem to have any effect for me.

ktham · 2023-12-14T04:48:10Z

I'm seeing the same issue as @nfantone , using v5.30.0 on Terraform 1.5.7.

On trying to a major version upgrade from engine version 14.3 to 15.3, I see this error

Error: updating RDS Cluster Instance (tf-20231213171635810400000006): InvalidParameterCombination: The parameter group test-aurora-serverless-db1-7378-pg15 with DBParameterGroupFamily aurora-postgresql15 can't be used for this instance. Use a parameter group with DBParameterGroupFamily aurora-postgresql14.

I am consistently seeing this error every time in my test. I didn't seem to have this issue on version 4.67.0 of the provider (I'll double check on that)

(Edit: Actually, sorry, I think the problem I had was that my RDS DB cluster had actually failed to upgrade to 15.3 and stayed at version 14.3 due to not enough memory. So the major version upgrade was due to insufficient memory. So disregard.)

YakDriver · 2024-01-25T00:38:49Z

Possibly related:

[Bug]: InvalidParameterCombination: DBClusterInstanceClass isn't supported for DB engine aurora-postgresql. #30596
[Bug]: rds mysql aurora force replacement on upgrade from 5.6 to 5.7 #27107

PablitoDPG · 2024-02-12T09:31:05Z

Ran into this issue with Terraform v1.5.1, and AWS provider 5.19.0
This bug makes it impossible to upgrade your cluster with Terraform!

github-actions · 2024-07-22T19:43:52Z

Warning

This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions · 2024-07-25T21:48:44Z

This functionality has been released in v5.60.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions · 2024-08-26T02:13:46Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jcarlson added bug Addresses a defect in current functionality. needs-triage Waiting for first response or review from a maintainer. labels Dec 14, 2022

github-actions bot added the service/rds Issues and PRs that pertain to the rds service. label Dec 14, 2022

ewbankkit removed the needs-triage Waiting for first response or review from a maintainer. label Dec 14, 2022

YakDriver mentioned this issue Jan 25, 2024

[Bug]: InvalidParameterCombination: DBClusterInstanceClass isn't supported for DB engine aurora-postgresql. #30596

Open

ewbankkit self-assigned this Jul 18, 2024

terraform-aws-provider bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Jul 18, 2024

ewbankkit mentioned this issue Jul 19, 2024

r/aws_rds_cluster: Wait for no pending modified values on Update if apply_immediately is true #38437

Merged

ewbankkit closed this as completed in #38437 Jul 22, 2024

github-actions bot added this to the v5.60.0 milestone Jul 22, 2024

github-actions bot removed the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Jul 25, 2024

github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: aws_rds_cluster does not wait properly for modifications #28339

[Bug]: aws_rds_cluster does not wait properly for modifications #28339

jcarlson commented Dec 14, 2022 •

edited

Loading

github-actions bot commented Dec 14, 2022

jcarlson commented Dec 14, 2022

jcarlson commented Dec 14, 2022

jcarlson commented Dec 15, 2022

jcarlson commented Dec 15, 2022

jcarlson commented Dec 16, 2022

nfantone commented Oct 18, 2023

niels1voo commented Oct 18, 2023

niels1voo commented Oct 18, 2023

nfantone commented Oct 18, 2023

ktham commented Dec 14, 2023 •

edited

Loading

YakDriver commented Jan 25, 2024 •

edited

Loading

PablitoDPG commented Feb 12, 2024

github-actions bot commented Jul 22, 2024

github-actions bot commented Jul 25, 2024

github-actions bot commented Aug 26, 2024

[Bug]: aws_rds_cluster does not wait properly for modifications #28339

[Bug]: aws_rds_cluster does not wait properly for modifications #28339

Comments

jcarlson commented Dec 14, 2022 • edited Loading

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

github-actions bot commented Dec 14, 2022

Community Note

jcarlson commented Dec 14, 2022

jcarlson commented Dec 14, 2022

jcarlson commented Dec 15, 2022

jcarlson commented Dec 15, 2022

jcarlson commented Dec 16, 2022

nfantone commented Oct 18, 2023

niels1voo commented Oct 18, 2023

niels1voo commented Oct 18, 2023

nfantone commented Oct 18, 2023

ktham commented Dec 14, 2023 • edited Loading

YakDriver commented Jan 25, 2024 • edited Loading

PablitoDPG commented Feb 12, 2024

github-actions bot commented Jul 22, 2024

github-actions bot commented Jul 25, 2024

github-actions bot commented Aug 26, 2024

jcarlson commented Dec 14, 2022 •

edited

Loading

ktham commented Dec 14, 2023 •

edited

Loading

YakDriver commented Jan 25, 2024 •

edited

Loading