Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThrottlingException failure permenantly blocks route53resolver/RuleAssociation creation #1010

Closed
AliAllomani opened this issue Dec 6, 2023 · 2 comments · Fixed by #1024
Closed
Assignees
Labels
bug Something isn't working is:triaged Indicates that an issue has been reviewed.

Comments

@AliAllomani
Copy link

What happened?

We have noticed that when creating a large number of route53resolver/RuleAssociation for a VPC, some of the resources are entering a permenant creation loop with the error,

async create failed: failed to create the resource: [{0 creating Route53 Resolver Rule Association: InvalidRequestException: [RSLVR-00802] Cannot associate rules with same domain name with same VPC. Conflict with resolver rule \"XXXX\"  []

Looking into the debug logs of the provider, I can detect the same behaviour for all the failed resources, where the first creation attempt fails with ThrottlingException and the following attempts fail with Cannot associate rules with same domain name with same VPC

Example logs from provider-aws,

2023-12-01T13:15:26Z	DEBUG	provider-aws	Async create starting...	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "tfID": ""}
2023-12-01T13:16:50Z	DEBUG	provider-aws	Async create ended.	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "error": "async create failed: failed to create the resource: [{0 waiting for Route53 Resolver Rule Association (rslvr-rrassoc-b20d114fc55943eda) create: ThrottlingException: Throttling - Maximum request rate exceeded.  []}]", "tfID": ""}

2023-12-01T13:16:50Z	DEBUG	provider-aws	Async create starting...	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "tfID": ""}
2023-12-01T13:16:51Z	DEBUG	provider-aws	Async create ended.	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "error": "async create failed: failed to create the resource: [{0 creating Route53 Resolver Rule Association: InvalidRequestException: [RSLVR-00802] Cannot associate rules with same domain name with same VPC. Conflict with resolver rule \"rslvr-rr-2aa08b5f7b844eaa8\"  []}]", "tfID": ""}

2023-12-01T13:17:51Z	DEBUG	provider-aws	Async create starting...	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "tfID": ""}
2023-12-01T13:17:51Z	DEBUG	provider-aws	Async create ended.	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "error": "async create failed: failed to create the resource: [{0 creating Route53 Resolver Rule Association: InvalidRequestException: [RSLVR-00802] Cannot associate rules with same domain name with same VPC. Conflict with resolver rule \"rslvr-rr-2aa08b5f7b844eaa8\"  []}]", "tfID": ""}

2023-12-01T13:18:52Z	DEBUG	provider-aws	Async create starting...	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "tfID": ""}
2023-12-01T13:18:52Z	DEBUG	provider-aws	Async create ended.	{"trackerUID": "62a8f98f-846b-4c18-815b-f920f5aca081", "resourceName": "testing2-ec1-rassoc-rslvr-rr-2aa08b5f7b844eaa8", "error": "async create failed: failed to create the resource: [{0 creating Route53 Resolver Rule Association: InvalidRequestException: [RSLVR-00802] Cannot associate rules with same domain name with same VPC. Conflict with resolver rule \"rslvr-rr-2aa08b5f7b844eaa8\"  []}]", "tfID": ""}

However, when looking into CloudTrail events, I can see that the first attempt was successfull on AWS side and the association has been created and association id was returned,

      "eventTime": "2023-12-01T13:15:29Z",
      "eventSource": "route53resolver.amazonaws.com",
      "eventName": "AssociateResolverRule",
      "awsRegion": "eu-central-1",
      "sourceIPAddress": "XXX",
      "userAgent": "APN/1.0 HashiCorp/1.0 Terraform/1.5.5 (+https://www.terraform.io) terraform-provider-aws/dev (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.261 (go1.20.10; linux; arm64) crossplane-provider-aws/v0.44.0 upbound-provider-aws/v0.44.0",
      "requestParameters": {
        "resolverRuleId": "rslvr-rr-2aa08b5f7b844eaa8",
        "vPCId": "vpc-0a87fc0d958a9b1c2"
      },
      "responseElements": {
        "resolverRuleAssociation": {
          "id": "rslvr-rrassoc-b20d114fc55943eda",
          "resolverRuleId": "rslvr-rr-2aa08b5f7b844eaa8",
          "vPCId": "vpc-0a87fc0d958a9b1c2",
          "status": "CREATING",
          "statusMessage": "[Trace id: 1-6569dc6f-0239de5a2947747e1ffab67d] Creating the association."
        }
      },
      "requestID": "3909c771-0c82-42e9-8fc7-5b34205e63bd",
      "eventID": "6c80c3c9-aab2-4977-9006-9af2598261ab",

...

      "eventTime": "2023-12-01T13:16:51Z",
      "eventSource": "route53resolver.amazonaws.com",
      "eventName": "AssociateResolverRule",
      "awsRegion": "eu-central-1",
      "sourceIPAddress": "XXX",
      "userAgent": "APN/1.0 HashiCorp/1.0 Terraform/1.5.5 (+https://www.terraform.io) terraform-provider-aws/dev (+https://registry.terraform.io/providers/hashicorp/aws) aws-sdk-go/1.44.261 (go1.20.10; linux; arm64) crossplane-provider-aws/v0.44.0 upbound-provider-aws/v0.44.0",
      "errorCode": "InvalidRequestException",
      "errorMessage": "[RSLVR-00802] Cannot associate rules with same domain name with same VPC. Conflict with resolver rule \"rslvr-rr-2aa08b5f7b844eaa8\"",
      "requestParameters": {
        "resolverRuleId": "rslvr-rr-2aa08b5f7b844eaa8",
        "vPCId": "vpc-0a87fc0d958a9b1c2"
      },
      "responseElements": null,
      "requestID": "12605741-e308-47d0-b88e-6c6d8ff7b543",
      "eventID": "14fa9822-5bf2-4cd3-8edf-bd16af25ff30",
      "readOnly": false,
      "eventType": "AwsApiCall",

This explains the behaviour where additional attempts are failing with the error Cannot associate rules with same domain name with same VPC, as the first attempt was successfull but wasn't handeled successfully

I cannot see detailed aws sdk calls traces on crossplane provider level, therefore i'm not sure if the issue caused by invalid response from AWS API or if the response was correct and there might be an invalid handling on crossplane provider level

How can we reproduce it?

The issue can be reproduced by creating a large number (100+) of RuleAssociation resources

Example composition

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
...
spec:
  writeConnectionSecretsToNamespace: upbound-system
  compositeTypeRef:
    apiVersion: aws.platform.upbound.io/v1alpha1
    kind: XVPC
  mode: Pipeline
  pipeline:
    - step: render-templates
      functionRef:
        name: function-go-templating
      input:
        apiVersion: gotemplating.fn.crossplane.io/v1beta1
        kind: GoTemplate
        source: Inline
        inline:
          template: |
...

            {{- range $index, $ruleId := .observed.composite.resource.status.share.r53ResolverRulesData.id.resolver_rule_ids }}
            ---
            apiVersion: route53resolver.aws.upbound.io/v1beta1
            kind: RuleAssociation
            metadata:
              annotations:
                gotemplating.fn.crossplane.io/composition-resource-name: {{ $networkName }}-{{ $regionShort }}-rassoc-{{ $index }}
              name: {{ $networkName }}-{{ $regionShort }}-rassoc-{{ $ruleId }}
            spec:
              forProvider:
                region: {{ $region }}
                resolverRuleId: {{ $ruleId }}
                vpcIdSelector:
                  matchLabels:
                    vpc: {{ $networkName }}-{{ $regionShort }}
            {{- end }}

    - step: patch-and-transform
      functionRef:
        name: function-patch-and-transform
      input:
        apiVersion: pt.fn.crossplane.io/v1beta1
        kind: Resources
        resources:
          - name: r53-resolver-rules
            base:
              apiVersion: tf.upbound.io/v1beta1
              kind: Workspace
              metadata:
                name: data-source-route53-resolver-rules
              spec:
                forProvider:
                  source: Inline
                  module: |
                    data "aws_route53_resolver_rules" "resolver_rules" {
                      rule_type    = "FORWARD"
                      share_status = "SHARED_WITH_ME"
                      owner_id     = var.shared_services_acc_id
                    }
                    output "aws_route53_resolver_rules_data" {
                      description = "Imported Route53 reslover rules"
                      value       = {
                        "id" = try(data.aws_route53_resolver_rules.resolver_rules, "")
                      }
                    }
                    variable "shared_services_acc_id" {
                      description = "Shared services account ID for DNS"
                      type        = string
                    }
                  vars:
                    - key: shared_services_acc_id
            patches:
              - fromFieldPath: spec.sharedServicesAccId
                toFieldPath: spec.forProvider.vars[0].value
              - type: ToCompositeFieldPath
                fromFieldPath: status.atProvider.outputs.aws_route53_resolver_rules_data
                toFieldPath: status.share.r53ResolverRulesData
                policy:
                  fromFieldPath: Optional

What environment did it happen in?

  • Crossplane Version:
  • Provider Version: v0.44.0
  • Kubernetes Version:
  • Kubernetes Distribution:
@erhancagirici
Copy link
Collaborator

@AliAllomani we were unable to reproduce the issue by creating a single VPC, 1000 rules and 1000 rule associations to the VPC. All rules and rule associations are properly created and we were unable to experience any throttling
During investigation, we found a case where this could potentially happen, and we think that crossplane/upjet#313 will resolve it, but could not validate your particular issue since we could not reproduce.

If you are able to consistently reproduce this, would you be willing to test these dev images and see if the problem is resolved?

index.docker.io/ulucinar/provider-aws-route53resolver:v0.46.1-5d27a872558d9cb6f14817905719b0c569c88397
index.docker.io/ulucinar/provider-aws-ec2:v0.46.1-5d27a872558d9cb6f14817905719b0c569c88397

@ulucinar
Copy link
Collaborator

This issue has been closed when we merged #1024. Please reopen this if the proposed fix is not working as expected for you. The proposed fix is now available with the patch release v0.46.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working is:triaged Indicates that an issue has been reviewed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants