Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#667 iam_role parsing before evaluation of other HCL blocks #1807

Merged
merged 21 commits into from
Oct 1, 2021

Conversation

denis256
Copy link
Member

@denis256 denis256 commented Sep 14, 2021

Updated HCL parsing first to extract iam_role and use obtained value in the evaluation of HCL blocks

Example:

$ aws sts get-caller-identity
{
    "UserId": "AAA",
    "Account": "BBB",
    "Arn": "arn:aws:iam::YYYY:/app0"
}

$ cat terragrunt.hcl
iam_role = "arn:aws:iam::ZZZZ:user/app666"

locals {
  var = run_cmd("echo", "get_aws_account_id", "${get_aws_account_id()}")
}

Before:

$ terragrunt init
....
get_aws_account_id YYYY
....

After this PR:

$ terragrunt init
...
get_aws_account_id ZZZZ
...

Fix for issue: #667

@denis256 denis256 marked this pull request as ready for review September 14, 2021 15:09
Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start! I think the general idea of parsing iam_role early in the process before others makes sense, but there are a few gotchas/edge cases that this implementation doesn't handle. See the comments below for more details.

Comment on lines 109 to 111
if flags.IamRole != nil {
terragruntOptions.IamRole = *flags.IamRole
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should check if terragruntOptions.IamRole is already set or not, since we should prefer to use the CLI passed in option over the local configuration.

Comment on lines 103 to 108
contextExtensions := EvalContextExtensions{}
flags := terragruntFlags{}
err := decodeHcl(hclFile, filename, &flags, terragruntOptions, contextExtensions)
if err != nil {
return nil, nil, TrackInclude{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this parsing here means that we won't honor iam_role set on parent configurations that are included. You'll need to implement a special version of PartialParseConfigFile that bypasses locals and only parses include + iam_role to handle this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, will look in this dirrection

@denis256
Copy link
Member Author

denis256 commented Sep 15, 2021

Behaviour with current changes:

$ cat terragrunt.hcl
remote_state {
  backend = "local"
  generate = {
    path      = "${get_aws_account_id()}.txt"
    if_exists = "overwrite"
  }
  config = {
    path = "terraform.tfstate"
  }
}

include "issue667" {
  path = "../issue-667-include/vars.hcl"
}
$ cat ../issue-667-include/vars.hcl
iam_role = "arn:aws:iam::AAAA:role/terragrunttest"

$ aws sts get-caller-identity
{
    "UserId": "X",
    "Account": "Y",
    "Arn": "arn:aws:iam::BBBB:user/terragrunttest2"
}

$ terragrunt init
....

$ ls AAAA.txt 
AAAA.txt

So get_aws_account_id() resolved with value from iam_role included from other file, without this change will be created BBBB.txt

@denis256 denis256 marked this pull request as draft September 15, 2021 09:14
config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated
@@ -545,6 +545,17 @@ func ParseConfigFile(filename string, terragruntOptions *options.TerragruntOptio
return nil, err
}

// If IamRole wasn't provided, and if it is config file, try first to decode IamRole
if include == nil && terragruntOptions.IamRole == "" {
iamConfig, err := PartialParseConfigString(configString, terragruntOptions, include, filename, []PartialDecodeSectionType{TerragruntFlags})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the iam_role param depends on a local var? Or a dependency block? Or anything else that hasn't been parsed yet? In other words, do we need to update the docs on parse order to call out any new limitations here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialParseConfigString handles local and include, but not dependency, so yes indeed we need to update the parsing order docs to indicate this.

stderr := bytes.Buffer{}

// Invoke terragrunt and verify used IAM role
err = runTerragruntCommand(t, fmt.Sprintf("terragrunt init --terragrunt-working-dir %s", TEST_FIXTURE_READ_IAM_ROLE), &stdout, &stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this actually try to assume the fake IAM role and lead to a permissions error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it will fail to create state file with IAM role ID

@denis256
Copy link
Member Author

Moved to draft since this implementation caused more integration tests to fail

@denis256
Copy link
Member Author

My local issue, the same number of tests don't pass with or without this change

@denis256 denis256 marked this pull request as ready for review September 16, 2021 06:30
@denis256
Copy link
Member Author

denis256 commented Sep 16, 2021

In the current implementation, it adds a +1 evaluation of HCL code at the beginning, fetching from result TerragruntFlags / IamRole and doing another revaluation.

It will work in cases of remote state definition but may generate some unexpected output in cases like:

iam_role = "arn:aws:iam::${local.id}:user/terragrunt"
locals {
  id = "6666"
  var = run_cmd("echo", "get_aws_account_id", "${get_aws_account_id()}")
}

It will print:

get_aws_account_id AAAA
get_aws_account_id 6666

AAAA from the first evaluation when iam_role was empty and was loaded default value, and 6666 from the second evaluation when iam_role was defined

@brikis98
Copy link
Member

@yorinasub17 If you're still digging into Terragrunt issues, could you review this one?

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates LGTM! I think we are very close. Just a few more nits to address and I think we can get this going.


The side effect of additional parsing of locals is a potential cause for concern, but I think it would actually work out given that I think most configurations would limit the amount of side effects in locals. What is more of a concern is the backward incompatible nature of this feature.

Currently, before this PR, get_aws_account_id will always return the value based on the original credentials. Now, this will change to the value after iam_role is assumed. In most use cases, this is probably not a huge issue and could be worked around.

I think where this gets super tricky is if users had a config like the following:

iam_role = "arn:aws:iam::${get_aws_account_id()}:role/terragrunttest"

Currently as implemented, what would happen is that in the initial parse for iam_role, iam_role would resolve get_account_id based on the original credentials, but when it goes through to do the full parse, get_account_id would now be based on the assumed iam_role. That means that potentially, the iam_role could change depending on which stage of parsing it is in.

This probably isn't a big deal for functions like get_account_id, where the value doesn't change even after assuming the role, but what about the following config?

# Where 333333333 is another account
iam_role = "arn:aws:iam::333333333:role/from-${get_aws_account_id()}"

In this case, get_aws_account_id() will return 333333333 in the full parse, which means it will assume a different IAM role when it runs terraform.

That said, I think this is fine for now because I feel like the above case will be rare in practice. But it makes me wonder if this is a fundamentally flawed feature, and whether we should consider separating configuration of cloud authentication out of the main terragrunt.hcl so that it can be parsed without side effects.

@brikis98 any thoughts on the above analysis of the implications here?

config/config.go Outdated
@@ -545,6 +545,17 @@ func ParseConfigFile(filename string, terragruntOptions *options.TerragruntOptio
return nil, err
}

// If IamRole wasn't provided, and if it is config file, try first to decode IamRole
if include == nil && terragruntOptions.IamRole == "" {
iamConfig, err := PartialParseConfigString(configString, terragruntOptions, include, filename, []PartialDecodeSectionType{TerragruntFlags})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialParseConfigString handles local and include, but not dependency, so yes indeed we need to update the parsing order docs to indicate this.

config/config.go Outdated
Comment on lines 548 to 558
// Initial evaluation of configuration to load flags like IamRole which will be used for final parsing
// https://github.com/gruntwork-io/terragrunt/issues/667
if include == nil && terragruntOptions.IamRole == "" {
iamConfig, err := PartialParseConfigString(configString, terragruntOptions, include, filename, []PartialDecodeSectionType{TerragruntFlags})
if err != nil {
return nil, err
}
if iamConfig.IamRole != "" {
terragruntOptions.IamRole = iamConfig.IamRole
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want this logic in ParseConfigString, since there are more entrypoints to that function than ParseConfigFile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, put this in a separate function to make it easier to extend in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh do we also need to parse and set iam_assume_role_duration here?

config/config.go Outdated
@@ -545,6 +545,18 @@ func ParseConfigFile(filename string, terragruntOptions *options.TerragruntOptio
return nil, err
}

// Initial evaluation of configuration to load flags like IamRole which will be used for final parsing
// https://github.com/gruntwork-io/terragrunt/issues/667
if include == nil && terragruntOptions.IamRole == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to omit the include == nil check here, as this means it won't handle a config where the iam_role is defined on the child config that has an include defined.

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates LGTM! Just had one nit on the function name. I'll kick off a regression build.

Note that before merge, I'd like to make sure @brikis98 has a chance to sanity check the backward incompatible implications.

config/config.go Outdated
Comment on lines 656 to 657
// extractIamRole - extract IAM role details from Terragrunt flags block
func extractIamRole(configString string, terragruntOptions *options.TerragruntOptions, filename string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Rename to setIAMRole, to indicate that this function has side effects (modifying attribute on terragruntOptions.

yorinasub17
yorinasub17 previously approved these changes Sep 29, 2021
Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build passed and last update LGTM, so only thing remaining is sanity check from Jim!

Comment on lines +608 to +609
* Output contains multiple times `uuid1` and `uuid2` because during HCL evaluation each `run_cmd` in `locals` is evaluated multiple times and random argument generated from `uuid()` save cached value under different key each time
* Output contains multiple times `uuid3`, +1 more output comparing to `uuid1` and `uuid2` - because `uuid3` is declared in locals and inputs which add one more evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: are these changes related to this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's the result of the change in the parsing order.

@brikis98
Copy link
Member

The side effect of additional parsing of locals is a potential cause for concern, but I think it would actually work out given that I think most configurations would limit the amount of side effects in locals. What is more of a concern is the backward incompatible nature of this feature.

Currently, before this PR, get_aws_account_id will always return the value based on the original credentials. Now, this will change to the value after iam_role is assumed. In most use cases, this is probably not a huge issue and could be worked around.

I think where this gets super tricky is if users had a config like the following:

iam_role = "arn:aws:iam::${get_aws_account_id()}:role/terragrunttest"

Currently as implemented, what would happen is that in the initial parse for iam_role, iam_role would resolve get_account_id based on the original credentials, but when it goes through to do the full parse, get_account_id would now be based on the assumed iam_role. That means that potentially, the iam_role could change depending on which stage of parsing it is in.

This probably isn't a big deal for functions like get_account_id, where the value doesn't change even after assuming the role, but what about the following config?

# Where 333333333 is another account
iam_role = "arn:aws:iam::333333333:role/from-${get_aws_account_id()}"

In this case, get_aws_account_id() will return 333333333 in the full parse, which means it will assume a different IAM role when it runs terraform.

Oof, you're right, this does feel messy.

But it makes me wonder if this is a fundamentally flawed feature, and whether we should consider separating configuration of cloud authentication out of the main terragrunt.hcl so that it can be parsed without side effects.

Yea, I've been having lots of second thoughts about:

  • iam_role
  • iam_assume_role_duration
  • get_aws_account_id()
  • get_aws_caller_identity_arn()
  • get_aws_caller_identity_user_id()

And for that matter, the way we've implemented Auto Init for S3 buckets, DynamoDB tables, etc.

They all feel like partial implementations that don't fully solve the problem; they also feel too cloud-specific and are a pain to maintain.

I wonder if what we really need is:

  1. Either remove all auth from Terragrunt and require that it's handled externally. Or, alternatively, have first-class auth functionality, but handled separately than the rest of the config. This would need a design.
  2. Either remove all "data fetching" from Terragrunt and require that it's handled externally. Or, alternatively, have first-class data source support: conceptually, I imagine a separate Terraform module with a bunch of data sources and output variables, we run apply on that first, and then expose it to your config similar to a dependency block; you can then use all that fetched data in your config, and we run apply on the module you have in source.
  3. Either remove all "Auto Init" for remote state for Terragrunt and require that it's handled externally. Or, alternatively, have first-class remote state support: conceptually, I imagine a separate Terraform module that creates your remote state resources (e.g., S3 bucket, DynamoDB table) and provides info about them via output variables, we run apply on that first, and then expose it to your config similar to a dependency block; you can then use that data in your config, and we run apply on the module you have in source.

But these are all larger changes, prob not super relevant to this PR.

For this PR, the key question, I suppose, is how many users combine the iam_role config with the various data functions:

  • get_aws_account_id()
  • get_aws_caller_identity_arn()
  • get_aws_caller_identity_user_id()

My guess is that 99% of the time, they are used separately, and the intention is (a) assume the specified IAM role and (b) fetch the requested data while authenticated as that IAM role. In the <1% of use cases where the two are combined together, the result will likely be unpredictable. Not sure if our partial parse can detect that and show a warning? If not, then perhaps we document it, merge this PR, and file issues to do the larger clean up mentioned above?

@yorinasub17
Copy link
Contributor

yorinasub17 commented Sep 30, 2021

That makes sense.

If not, then perhaps we document it, merge this PR, and file issues to do the larger clean up mentioned above?

Let's move forward with documenting this. @denis256 can you update the docs for the following functions and mention the caveat that the value will change depending on the parsing stage?

  • get_aws_account_id
  • get_aws_caller_identity_arn
  • get_aws_caller_identity_user_id

The source is here: https://github.com/gruntwork-io/terragrunt/blob/master/docs/_docs/04_reference/built-in-functions.md

Separately, I'll go ahead and file an issue capturing the above point: #1840

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs updates LGTM! I'm going to merge this in now given our discussion, and release as backward incompatible so we can better highlight the behavior change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants