Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add --retry-once-with-cleanup to terraform_validate #441

Merged
merged 43 commits into from
Nov 26, 2022

Conversation

baolsen
Copy link
Contributor

@baolsen baolsen commented Oct 21, 2022

Put an x into the box if that apply:

  • This PR introduces breaking change.
  • This PR fixes a bug.
  • This PR adds new functionality.
  • This PR enhances existing functionality.

Description of your changes

Adds an argument to terraform_validate hook to cleanup .terraform directory if validate fails.

- id: terraform_validate
  args:
    - --hook-config=--retry-once-with-cleanup=true     # Boolean. true or false

This is a workaround to a known issue with terraform_validate.
Number 3: https://github.com/antonbabenko/pre-commit-terraform#terraform_validate

Related links below:

Related #224
Related #301

I try to address the performance concern raised in #301 by only clearing .terraform if validate fails once per directory.

How can we test changes

Tested with:

  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: '850bb967e2d38a89665f0640467a3d5d2d6f634e'
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
        args:
          - --hook-config=--retry-once-with-cleanup=true

Steps:
1 Run hook on a known-good repo
2 Enter .terraform folder and corrupt it by deleting some providers
3 Run hook without the flag above. Validation fails.
4 Run hook with the flag above. Validation runs for a while (doing init behind the scenes) and then passes
5 Run hook with the flag above. Validation runs quickly and then passes

@baolsen baolsen force-pushed the retry_once_with_cleanup branch from 850bb96 to a7e3d90 Compare October 21, 2022 09:01
Copy link
Collaborator

@yermulnik yermulnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baolsen Thanks for the contribution.
Overall PR looks good to me, apart from the below points:

  1. Do I get it right that having this change in place terraform_validate hook will unconditionally remove .terraform dir from current working dir and run again even if there are no issues with the content of .terraform dir? E.g. when validation fails because of an error with TF code rather than because of a broken content of .terraform dir. This doesn't seem to be right. I'm not able to verify terraform validate exit codes and output at the moment, hence could you please look up whether TF does output different error code on broken .terraform dir content and if not then it may worth to capture its output (this is already in place) and lookup a common to this use case set of error messages to make a decision on whether to re-init TF by removing .terraform dir or not.
  2. This doesn't look good to unconditionally/inadvertently remove user content, hence could you please update this PR to prompt for user approval to remove .terraform dir before proceeding?
  3. Since .terraform dir may contain files which belong to GIT repo (e.g. cloned TF modules) and since these files may be sort of protected (w/o writable bit attached), it should better force removal via -f option to rm.

Thanks.

Comment on lines 89 to 90
# Will only be displayed if validation fails again.
common::colorify "yellow" "Validation failed. Re-initialising: $dir_path"
Copy link
Collaborator

@MaxymVlasov MaxymVlasov Oct 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to display that message in all cases. Please add to README example "verbose: true with the related code-comment about displaying that msg

@MaxymVlasov
Copy link
Collaborator

  1. This doesn't look good to unconditionally/inadvertently remove user content, hence could you please update this PR to prompt for user approval to remove .terraform dir before proceeding?

Not sure that pre-commit is able to add so much interaction. + Then we need a flag that will skip interaction, for jobs run on CI.

Also, I think that specified in 2. functional is too protective and not needed at all

@MaxymVlasov MaxymVlasov added feature New feature or request estimate/2h Need 2 hours to be done hook/terraform_validate Bash hook hacktoberfest-accepted HacktoberFest'21 and '22 labels Oct 23, 2022
@yermulnik
Copy link
Collaborator

Not sure that pre-commit is able to add so much interaction.

Mmm, I didn't think of it indeed... Wouldn't read -p "Nuke .terraform/ dir? (Y/n)" work?

Then we need a flag that will skip interaction, for jobs run on CI.

Yep, right! User should be able to explicitly put their consent via hook parameter. Thank for the pointer.

Also, I think that specified in 2. functional is too protective and not needed at all

It's when we know for sure it was pre-commit-terraform who created .terraform/ dir and not a user who wanted to store there some important data (like e.g. TF module or TF provider that user had to manually download and extract into that dir, or even when user had to make manual changes to any of them for testing, debugging, development, whatever). This is why I think we should not remove user data unconditionally or implicitly but let user explicitly approve such an action.
Maybe you're right and we may go with an optional parameter for this hook, which when set does delete .terraform/ directory, otherwise hook produces notification message that user has to explicitly set this parameter to make re-init code do its action.

@baolsen
Copy link
Contributor Author

baolsen commented Oct 28, 2022

Thanks for the great comments :)

Do I get it right that having this change in place terraform_validate hook will unconditionally remove .terraform dir from current working dir and run again even if there are no issues with the content of .terraform dir?

Yes, good catch. Whether there is a validation error or a problem with .terraform directory it will re-validate at the moment.

terraform validate exit codes

Unfortunately it seems to only return exit code 1 for validation errors or an error caused by invalid .terraform config. Thanks for the tip for parsing the error message, I'll definitely look into that now.

User should be able to explicitly put their consent via hook parameter. Thank for the pointer.

I was not aware that some users would be heavily customising .terraform contents.

However it seems a bit strange to need an extra hook parameter for giving consent for the first hook parameter's intended behaviour. If the user has specified the current parameter --retry-once-with-cleanup is that enough consent? Or would we prefer to have something like --retry-with-prompt-for-cleanup and --retry-auto-approve. The second reviewer seemed to think this is overkill, and we only need the current parameter.

Happy to take guidance here.

I'll do some fixups for the other minor comments.
Thanks!

@baolsen
Copy link
Contributor Author

baolsen commented Oct 28, 2022

PR updated with ability to parse terraform validate output for specific error summaries, and only delete + retry on those.

The list can be built up over time from other use cases.

@baolsen baolsen requested review from yermulnik and MaxymVlasov and removed request for yermulnik and MaxymVlasov October 28, 2022 09:30
@baolsen baolsen force-pushed the retry_once_with_cleanup branch from ca50529 to c103542 Compare October 28, 2022 10:10
@yermulnik
Copy link
Collaborator

yermulnik commented Oct 29, 2022

I was not aware that some users would be heavily customising .terraform contents.

Yep, that's more of a rare use case for regular users, though those who develop TF providers might not be happy with implicit removal of .terraform dir with the development code. We have to count such uses cases in. So that the new behaviour harms as less as possible, especially given that previously this required manual steps, which may be counted in as an explicit consent.

If the user has specified the current parameter --retry-once-with-cleanup is that enough consent?

Yeah, I have to agree with this given Max's opinion. Though hook's behavior should be described in detail in README then, including a warning/notice about implicit/unconditional removal of .terraform dir, so that users are aware and agree with that.

PR updated

Thanks. I'll give it a look in a min.

hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
@yermulnik
Copy link
Collaborator

Other than comments above I think the PR looks good to me.

@baolsen
Copy link
Contributor Author

baolsen commented Nov 4, 2022

Hey @yermulnik thanks for the further comments.
I think I've addressed them in the latest push, please take a look.

@baolsen
Copy link
Contributor Author

baolsen commented Nov 25, 2022

Can be tested with:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: '29086b84ea45545aca7a5627ee59d7f63a9f6f59'
    hooks:
      - id: terraform_validate
        args:
          - --hook-config=--retry-once-with-cleanup=true <- Set to true, false, FAL, SOMETHING.

Not sure if we should raise an error when --retry-once-with-cleanup is not exactly true or false, but at the moment only true will do a retry and other values are ignored.

If it is difficult to reproduce the specific errors we are looking for, then it can still be partially tested:

Validation can be forced to fail and retry by making some invalid TF code, then adding the relevant error message to the terraform_validate.sh in local pre-commit cache.

@baolsen
Copy link
Contributor Author

baolsen commented Nov 25, 2022

I also see a lot of unrelated changes in the README, not sure if those are intended.

@baolsen baolsen requested review from MaxymVlasov and removed request for yermulnik November 25, 2022 09:02
@MaxymVlasov
Copy link
Collaborator

I also see a lot of unrelated changes in the README, not sure if those are intended.

Yeah, that's chore changes, which mostly fix style and language, because I had time to check it in this PR, and they are so tiny, so no need to create a new PR and then, possible, deal with conflicts

Copy link
Collaborator

@yermulnik yermulnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with tiny remarks left for your guys consuderation

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
hooks/terraform_validate.sh Outdated Show resolved Hide resolved
@MaxymVlasov MaxymVlasov added estimate/2days Need 2 work days to be done and removed estimate/2h Need 2 hours to be done labels Nov 25, 2022
@MaxymVlasov MaxymVlasov changed the title feat: Add --retry-once-with-cleanup to terraform_validate feat: Add --retry-once-with-cleanup to terraform_validate Nov 25, 2022
@antonbabenko antonbabenko merged commit 96fe3ef into antonbabenko:master Nov 26, 2022
antonbabenko pushed a commit that referenced this pull request Nov 26, 2022
# [1.77.0](v1.76.1...v1.77.0) (2022-11-26)

### Features

* Add `--retry-once-with-cleanup` to `terraform_validate` ([#441](#441)) ([96fe3ef](96fe3ef))
@antonbabenko
Copy link
Owner

This PR is included in version 1.77.0 🎉

@brettcurtis
Copy link

brettcurtis commented Nov 28, 2022

Here to say thanks, this PR is a big help!

edit - In our case this happens as a result of dependabot bumping a provider version, I suspect there are many other reasons why this could happen but would an option to just run -upgrade make sense as well?

@MaxymVlasov
Copy link
Collaborator

Hi @brettcurtis.
terraform init -upgrade could act differently than you expected, eg. download a new major provider version. See details of how it works here

@brettcurtis
Copy link

Thanks for reply @MaxymVlasov - Yeah that could mess up folks I suppose. For us we always pull current release shift testing left and fix any problems before we hit production. So basically, terraform init -upgrade is part of our flow already on any PR.

@brettcurtis
Copy link

brettcurtis commented Nov 28, 2022

Ah, so I just tested this:

Validation failed: regional/infra
{
  "format_version": "1.0",
  "valid": false,
  "error_count": 1,
  "warning_count": 0,
  "diagnostics": [
    {
      "severity": "error",
      "summary": "missing or corrupted provider plugins:
  - registry.terraform.io/hashicorp/google: there is no package for registry.terraform.io/hashicorp/google 4.44.1 cached in .terraform/providers
  - registry.terraform.io/hashicorp/google-beta: there is no package for registry.terraform.io/hashicorp/google-beta 4.44.1 cached in .terraform/providers
  - registry.terraform.io/hashicorp/random: there is no package for registry.terraform.io/hashicorp/random 3.4.3 cached in .terraform/providers",
      "detail": ""
    }
  ]
}

The reason I'm in this state is because dependabot manages our terraform provider versions and merged a couple PRs with provider updates since the last time I worked on the given repo. So, I guess this PR doesn't solve that problem? Another thing I noticed is that it runs init on directories that only include .tfvars.

@MaxymVlasov
Copy link
Collaborator

@brettcurtis better create an issue and describe which problems you face, how you get JSON output (because hook should always return non-JSON outputs to the end-user), and other things that exist in bug issue template

@brettcurtis
Copy link

Will do, I'm trying to come up with a simple test case but in my tests, I can't reproduce what I was expecting to be the problem. The assumption was that my local .terraform has an older version of a provider cached and I pulled down a new .terraform.lock.hcl that was updated with a new provider version it would fail. I'll report back once I can figure out how to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate/2days Need 2 work days to be done feature New feature or request hacktoberfest-accepted HacktoberFest'21 and '22 hook/terraform_validate Bash hook
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants