Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached regions break IAM role lookups during AWS auth in GovCloud #9935

Closed
bluekeyes opened this issue Sep 11, 2020 · 1 comment · Fixed by #9947
Closed

Cached regions break IAM role lookups during AWS auth in GovCloud #9935

bluekeyes opened this issue Sep 11, 2020 · 1 comment · Fixed by #9947
Labels
auth/aws bug Used to indicate a potential bug

Comments

@bluekeyes
Copy link
Contributor

bluekeyes commented Sep 11, 2020

Describe the bug
We run a Vault instance in the us-gov-west-1 region of GovCloud (aws-us-gov partition) and use it to authenticate other EC2 instances in the same region using IAM auth. This authentication will sometimes stop working after restarting Vault or if leadership changes and the follower node becomes active. When authentication is failing, restarting or stepping down Vault enough times will eventually fix the problem and auth will work until the next restart or step-down event.

When authentication fails, Vault returns this error:

$ vault login -method=aws role=iam-test
Error authenticating: Error making API request.

URL: PUT https://vault.domain/v1/auth/aws/login
Code: 400. Errors:

* error looking up full ARN of entity &{aws-us-gov <account-id> assumed-role  <iam-role-name> <instance-id>}: error fetching role "iam-role-name": SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'us-gov-east-1'.
	status code: 403, request id: b2b3ea37-0aea-49ea-bd60-c37677475dd1

I believe this is because the fullARN function looks up a region for the partition in a map that associates a random region with each partition at startup. If this map caches us-gov-east-1 as the region, then all authentication attempts fail. Restarting Vault enough times will eventually randomly cache us-gov-west-1 as the region and authentication will start working.

To Reproduce
Steps to reproduce the behavior:

  1. Configure Vault for IAM auth in us-gov-west-1. The Vault role should include a wildcard pattern in the bound_iam_principal_arn property (e.g. match all IAM principals in a specific account).
  2. Run vault login -method=aws role=<role-name> region=us-gov-west-1
  3. The login fails with the error above. If the login works, restart the Vault server process and try again

Expected behavior
The success or failure of authentication should not randomly change when restarting Vault.

Environment:

  • Vault Server Version (retrieve with vault status): 1.4.2
  • Vault CLI Version (retrieve with vault version): tested both 1.0.2 and 1.4.2
  • Server Operating System/Architecture: Ubuntu 16.04

Vault server configuration file(s):

ui           = "true"
api_addr     = "https://vault.domain:443"
cluster_addr = "https://<ip>:8201"

ha_storage "dynamodb" {
  ha_enabled = "true"
  region     = "us-gov-west-1"
  table      = "vault"

  read_capacity  = 4
  write_capacity = 4
}

storage "postgresql" {
  connection_url = "<url>"
}

telemetry {
  dogstatsd_addr = "127.0.0.1:8125"
}

cluster_name = "vault"

listener "tcp" {
  address         = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "/etc/ssl/certs/self-signed.crt"
  tls_key_file  = "/etc/ssl/certs/self-signed.key"
}

seal "awskms" {
  kms_key_id = "<kms-id>"
}

AWS backend configuration:

$ vault read auth/aws/config/client
Key                           Value
---                           -----
access_key                    n/a
endpoint                      https://ec2.us-gov-west-1.amazonaws.com
iam_endpoint                  https://iam.us-gov.amazonaws.com
iam_server_id_header_value    n/a
max_retries                   -1
sts_endpoint                  https://sts.us-gov-west-1.amazonaws.com
sts_region                    us-gov-west-1

The server uses an instance profile to authenticate with AWS.

$ vault read auth/aws/role/iam-test
Key                               Value
---                               -----
allow_instance_migration          false
auth_type                         iam
bound_account_id                  []
bound_ami_id                      []
bound_ec2_instance_id             <nil>
bound_iam_instance_profile_arn    []
bound_iam_principal_arn           [arn:aws-us-gov:iam::<account-id>:*]
bound_iam_principal_id            []
bound_iam_role_arn                []
bound_region                      []
bound_subnet_id                   []
bound_vpc_id                      []
disallow_reauthentication         false
inferred_aws_region               us-gov-west-1
inferred_entity_type              ec2_instance
max_ttl                           168h
policies                          [iam-test]
resolve_aws_unique_ids            true
role_id                           bf6db476-34d9-f2c8-11b3-61c56d2a3f22
role_tag                          n/a
token_bound_cidrs                 []
token_explicit_max_ttl            0s
token_max_ttl                     168h
token_no_default_policy           false
token_num_uses                    0
token_period                      0s
token_policies                    [iam-test]
token_ttl                         72h6m
token_type                        default
ttl                               72h6m

Note that sts_endpoint and sts_region are set to the right region, as is inferred_aws_region.

Additional context

When testing login from the Vault 1.0.2, I made sure both AWS_REGION and AWS_DEFAULT_REGION were set to us-gov-west-1, since region is not accepted as a login parameter.

I'm not sure if us-gov-east-1 is ever valid as a region for IAM operations. It may be that the aws-us-gov partition should always select us-gov-west-1 as the region, similar to how the aws partition always selects us-east-1.

@bluekeyes
Copy link
Contributor Author

I proposed what seemed like the most obvious fix in #9947, but I don't know if this is actually the correct solution.

@raskchanky raskchanky added auth/aws bug Used to indicate a potential bug version/1.4.x labels Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auth/aws bug Used to indicate a potential bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants