Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource/aws_route_table: route table is removed from state when it fails to be available (in 2m) on creation #21829

Closed
ialidzhikov opened this issue Nov 18, 2021 · 4 comments
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.

Comments

@ialidzhikov
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

terraform version - 0.12.31
provider-aws version - 3.63.0

Affected Resource(s)

  • aws_route_table

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

provider "aws" {
  access_key = var.ACCESS_KEY_ID
  secret_key = var.SECRET_ACCESS_KEY
  region     = "eu-west-1"
}

resource "aws_vpc" "vpc" {
  cidr_block           = "10.222.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true
}

resource "aws_route_table" "routetable_main" {
  vpc_id = aws_vpc.vpc.id
}

Debug Output

Panic Output

Expected Behavior

aws_route_table resource to be resilient to AWS EC2 consistency issues and to do not remove a route table from the state that will become present in the AWS API after some time.

Actual Behavior

With the above partial configuration we notice that a aws_route_table leaks when it fails to be available (in 2m) on creation.

Here is the output from the first terraform apply run:

aws_route_table.routetable_main: Creating...
aws_route_table.routetable_main: Still creating... [10s elapsed]
aws_route_table.routetable_main: Still creating... [20s elapsed]
aws_route_table.routetable_main: Still creating... [30s elapsed]
aws_route_table.routetable_main: Still creating... [40s elapsed]
aws_route_table.routetable_main: Still creating... [50s elapsed]
aws_route_table.routetable_main: Still creating... [1m0s elapsed]
aws_route_table.routetable_main: Still creating... [1m10s elapsed]
aws_route_table.routetable_main: Still creating... [1m20s elapsed]
aws_route_table.routetable_main: Still creating... [1m30s elapsed]
aws_route_table.routetable_main: Still creating... [1m40s elapsed]
aws_route_table.routetable_main: Still creating... [1m50s elapsed]
aws_route_table.routetable_main: Still creating... [2m0s elapsed]

Error: error waiting for Route Table (rtb-1234) to become available: timeout while waiting for state to become 'ready' (timeout: 2m0s)

  on tf/main.tf line 52, in resource "aws_route_table" "routetable_main"
  52: resource "aws_route_table" "routetable_main"

Running terraform apply once again yields the following ouput:

aws_route_table.routetable_main: Refreshing state... [id=rtb-1234]

# omitted

aws_route_table.routetable_main: Creating...
aws_route_table.routetable_main: Still creating... [10s elapsed]
aws_route_table.routetable_main: Still creating... [20s elapsed]
aws_route_table.routetable_main: Still creating... [30s elapsed]
aws_route_table.routetable_main: Still creating... [40s elapsed]
aws_route_table.routetable_main: Still creating... [50s elapsed]
aws_route_table.routetable_main: Still creating... [1m0s elapsed]
aws_route_table.routetable_main: Still creating... [1m10s elapsed]
aws_route_table.routetable_main: Still creating... [1m20s elapsed]
aws_route_table.routetable_main: Still creating... [1m30s elapsed]
aws_route_table.routetable_main: Still creating... [1m40s elapsed]
aws_route_table.routetable_main: Still creating... [1m50s elapsed]
aws_route_table.routetable_main: Still creating... [2m0s elapsed]

Error: error waiting for Route Table (rtb-5678) to become available: timeout while waiting for state to become 'ready' (timeout: 2m0s)

  on tf/main.tf line 52, in resource "aws_route_table" "routetable_main":
  52: resource "aws_route_table" "routetable_main" {

In the second terraform apply run, you can notice that rtb-1234 is most probably removed from the state.

The corresponding code on terraform-provider-aws side that remove the route table from the state if it is not found:

if !d.IsNewResource() && tfresource.NotFound(err) {
log.Printf("[WARN] Route Table (%s) not found, removing from state", d.Id())
d.SetId("")
return nil
}

After some time we actually notice that both rtb-1234 and rtb-5678 are present in the AWS API. But they are no longer present in the terraform state because they were most probably removed on refresh.

Steps to Reproduce

See above

Important Factoids

References

  • #0000
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ec2 Issues and PRs that pertain to the ec2 service. labels Nov 18, 2021
@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 18, 2021
@ewbankkit
Copy link
Contributor

@ialidzhikov Thanks for raising this issue.

I think that Terraform is working as-designed here:
The CreateRouteTable API succeeds and the route table ID (rtb-1234) is recorded in state, but due to AWS EC2 API eventual consistency, we cannot successfully read it back within 2 minutes:

output, err := conn.CreateRouteTable(input)
if err != nil {
return fmt.Errorf("error creating Route Table: %w", err)
}
d.SetId(aws.StringValue(output.RouteTable.RouteTableId))
if _, err := WaitRouteTableReady(conn, d.Id(), d.Timeout(schema.TimeoutCreate)); err != nil {
return fmt.Errorf("error waiting for Route Table (%s) to become available: %w", d.Id(), err)
}

The Terraform resource creation operation has now failed but as an ID was recorded, that resource (rtb-1234) is now marked as tainted in state and will be destroyed the next time a Terraform operation is performed on it.
Quickly running terraform apply (or similar) again (which I am assuming you did) results in Terraform refreshing state, which involves attempting to read the tainted Route Table (rtb-1234) so that Terraform knows whether to run the resource destroy operation on it. Again the EC2 API is returning that the Route Table is not found, so Terraform thinks OK, this resource was deleted manually so I don't need to delete it and will then create a new Route Table (rtb-5678).
So both rtb-1234 and rtb-45678 exist at AWS (and eventually we should be able to successfully query for rtb-1234 via the EC2 API), but only rtb-45678 is recorded in state.

It think the real issue here is that 2 minute timeout. The documentation says:

Confirm the state of the resource before you run a command to modify it. Run the appropriate Describe command using an exponential backoff algorithm to ensure that you allow enough time for the previous command to propagate through the system. To do this, run the Describe command repeatedly, starting with a couple of seconds of wait time, and increasing gradually up to five minutes of wait time.

So, strictly speaking that 2 minutes should be 5 minutes.

@ialidzhikov
Copy link
Contributor Author

Thank you for your analysis @ewbankkit ! Last week I created #21847 . Let me know if you have comments on it.

@ialidzhikov
Copy link
Contributor Author

Thank you for your analysis @ewbankkit ! Last week I created #21847 . Let me know if you have comments on it.

Thank you very much @ewbankkit and all terraform-provider-aws maintainers for having a look into the above PR (sarcasm). Looks like the aws_route create timeout was increased from 2m to 5m with #21531. Hope that this helps with this issue.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants