Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve partial parsing HCL strings by adding cache #2203

Closed
maunzCache opened this issue Jul 19, 2022 · 2 comments · Fixed by #2204
Closed

Improve partial parsing HCL strings by adding cache #2203

maunzCache opened this issue Jul 19, 2022 · 2 comments · Fixed by #2204
Labels
enhancement New feature or request

Comments

@maunzCache
Copy link
Contributor

maunzCache commented Jul 19, 2022

Motivation

As outlined in #1971 my customer still has issues with the parsing time of the later Terragrunt versions since 0.32. Still there is a requirement for my team to get a new version of Terraform and Terragrunt working ASAP. So @nilreml and I took some time to understand the Terragrunt code better and find a potential solution to our issue.
To get this straight: We are working on a big project containing ~150 AWS accounts, 6 environments and ~75 Terraform modules. The Terragrunt code is sometimes very complex containing multiple dependency layers. We already took the time to optimize and review all of the *.hcl files to reduce includes/read_terragrunt_config calls, calls to other custom functions and simplifying logic of loops. Still we only get minor improvements in our parsing times (time until all dependencies are resolved and Terraform is triggered the first time).

After the merge of #2010 and the 0.36.6 release another colleague of mine did some re-measuring of execution times in our environment:

tg version init 1 init 2 init 3 plan 1 avg factor to 0.29.2
0.29.2 parsing 0:01:11 0:00:50 0:01:14 0:01:02 64s 1.0
0.29.2 complete 0:04:11 0:04:32 0:04:13 0:04:14 257s 1.0
0.33.2 parsing 0:01:15 0:01:44 - - 127s 2.0
0.33.2 complete 0:05:48 0:07:34 - - 401s 1.6
0.34.3 parsing 0:05:54 0:05:21 - - 337s 5.3
0.34.3 complete 0:24:59 0:21:02 - - 1381s 5.4
0.35.20 parsing 0:04:50 0:04:34 - - 282s 4.4
0.35.20 complete 0:23:39 0:23:14 - - 1406s 5.4
0.36.3 parsing 0:04:23 0:04:44 - - 274s 4.3
0.36.3 complete 0:24:26 0:21:57 - - 1391s 5.4
0.36.6 parsing 0:01:16 0:02:09 0:01:52 0:02:11 112s 1.8
0.36.6 complete 0:07:26 0:11:29 0:07:31 0:08:54 530s 2.1

Unfortunately i don't know his testing method, but i can confirm that times are still high.

As you can see, there is a gap between 0.29 and 0.33 (0.32 to be precise) which the 0.36 version was unable to fix. So we did some code crawling on that version to see where the "evil code" was introduces and found changes between 0.31 and 0.32 to be the culprit.

To be super sure on this we created flamegraphs for the the versions in question:

Those flamegraphs have a hacked in pprof and were recompiled by me from their latest tag on git. Those profiles have been gathered during a terragrunt run-all init run.

Rationale

(Note: The sections below may mix up the functionality of include and dependency, however, every dependency will cause an implicit include due to required parsing of the dependencies.)

Due to introduction of multiple includes in 0.32 (presumably #1804) we found that the stack sizes between 0.31 and 0.32 drastically changed. This makes sense in general so Terragrunt would now parse includes multiple times and do another step of marshaling and unmarshaling HCL with JSON due to typing conflicts with cty in decode sections - at least from what we understood. Even though we cannot pin those changes to the PartialParseConfig calls, we saw at least increased calls to DecodeBaseBlocks which is called from PartialParseConfig. Also it could be related to changes in configFileHasDependencyBlock.

Our project makes use of a deep variable hierarchy using .hcl files, causing repeated parsing of .hcl files over and over for each module, in addition to the usual root terragrunth.hcl.
Also, our project has hundreds of module instances using terragrunt.hcl files via symbolic links.

Revisiting the changes of the setIAMRole function in config.go (#2010), we considered adding a cache/memoization to either DecodeBaseBlocks or PartialParseConfigString. We already knew that we can cache PartialParseConfigString just as setIamRole does, so this is our preferred option.

Comparing performance:

- Best case Average case Worst case
Regular implementation of partial parsing O(1) O(n) O(n^2)
Cached implementation O(1) O(n) O(n)

With n being number of iterations for parsing dependencies/includes.

As of now we are not aware of any Terragrunt features the cause side-effects on PartialParseConfigString except for the already discussed setIAMRole.

Doing

To prove our points we did an initial implementation reusing the code of cache.go which was implemented for the setIAMRole feature. A switch was added to run the compiled version with and without caching behavior.
Additionally, two fixtures and two benchmarks were written to have a reproducible use cases.

The PR will follow up on this issue.

Please let's discuss your thoughts on our observations and suggested mitigation.

This story was co-created by @nilreml.

@maunzCache
Copy link
Contributor Author

Just as i announced it on the PR, @nilreml will take over the topic as i leave the customer project. However, i believe the implementation is finished to if there is nothing to add from anyone you can give the PR a try and review it.

@denis256
Copy link
Member

HCL parsing cache was included in https://github.com/gruntwork-io/terragrunt/releases/tag/v0.38.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants