chore: cache expensive lookups #553

iliapolo · 2022-01-11T12:21:20Z

Currently, ingesting packages with a large amount of submodules and/or types can take a very long time.

For example, running the following:

const docs = await Documentation.forPackage('@cdktf/[email protected]');
console.time('render');
await docs.render({ language: Language.PYTHON, submodule: 'wafv2' });
console.timeEnd('render');

Will result in:

Installing package @cdktf/[email protected]
render: 9:52.761 (m:ss.mmm)

With this PR, by adding a couple of caches, this is reduced to:

Installing package @cdktf/[email protected]
render: 32.382s

Fixes cdklabs/construct-hub#664

iliapolo · 2022-01-11T12:25:25Z

src/docgen/transpile/transpile.ts

+    const cached = this.parentModulesCache.get(moduleLike.fqn);
+    if (cached) return cached;
+
+    const types = moduleLike.types;


This was the main culprit. Apparently doing Object.values on a map with 8000 entries takes roughly 3ms, and this used to happen twice for each typeReference transpilation.

Specifically for non typescript languages this is exacerbated because of parameter expansion.

iliapolo · 2022-01-11T20:08:22Z

Some more analysis details. Before this PR, here is how the flamegraph looked like:

We can see the render method is taking the bulk amount of time compared to loadAssembly.
It also shows very noticeable hotspots.

After this PR:

We can see render is now a small fraction of the cost, with no substantial hotspots on the call stack.
The hottest spot now is the validateSchema function of jsonschema.

@RomainMuller Do you think we can improve something there, or maybe run loadAssembly without validation? I don't think we need validation at this point, at least not where its used in construct hub.

iliapolo · 2022-01-11T20:09:40Z

src/docgen/transpile/transpile.ts

+      // if the type is in a submodule, the submodule name is the first
+      // part of the namespace. we construct the full submodule fqn and search for it.
+      const submoduleFqn = `${type.assembly.name}.${type.namespace.split('.')[0]}`;
+      const submodules = type.assembly.submodules.filter(


According to the flame graph, the assembly.submodules call is also pretty expensive. The submodules cache here reduced another 15 seconds from the render time.

MrArnoldPalmer

Awesome! 🔥 🔥 🔥

#711) During the reprocessing workflow, step functions tries to start a burst of 60,000 (current number of package versions) ECS tasks. Since our account limit is only 1000 parallel tasks, we need to apply a retry policy so the throttled tasks don't end up in the DLQ. Currently, our retry policy allows for a total wait time of roughly 2.5 hours. Lets do some math to see if this is enough. Since tasks also have boot time, we don't really run 1000 in parallel. In practice what we normally see is: ![Screen Shot 2022-01-12 at 4 12 24 PM](https://user-images.githubusercontent.com/1428812/149156438-9ba5e844-fa62-4294-9760-92887f6825f5.png) So for simplicity sake lets assume 500 parallel tasks. If every task takes about 2 minutes (empirically and somewhat based on `jsii-docgen` test timeouts) we are able to process 1000 tasks in 4 minutes. This means that in order to process 60,000 tasks, we need 4 hours. The current retry policy of 2.5 hours allows us to process only about 35,000 tasks. And indeed, most recent execution of the workflow resulted in the remaining 25,000 tasks being sent to the DLQ. The retry policy implemented in this PR gives us 7 hours. ## TODO - [x] 5 hours might still a bit too close. Run the reprocess workflow again to see if the numbers have changed following cdklabs/jsii-docgen#553. Follow up: `jsii-docgen` improvements did make it better but not enough to put a significant dent. I've updated the PR to give us 7 hours. Fixes #708 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

cache parent module

784e4a8

iliapolo commented Jan 11, 2022

View reviewed changes

iliapolo added 3 commits January 11, 2022 17:55

Merge branch 'main' into epolon/performance-improvements

f25b0b7

Merge branch 'main' into epolon/performance-improvements

75620a7

submodule cache

e27eb87

iliapolo commented Jan 11, 2022

View reviewed changes

iliapolo requested a review from a team January 11, 2022 20:11

iliapolo marked this pull request as ready for review January 11, 2022 20:11

iliapolo changed the title ~~chore: cache parent module lookup~~ chore: cache expensive lookups Jan 11, 2022

MrArnoldPalmer added the pr/do-not-merge This PR should not be merged at this time. label Jan 11, 2022

MrArnoldPalmer approved these changes Jan 11, 2022

View reviewed changes

iliapolo removed the pr/do-not-merge This PR should not be merged at this time. label Jan 12, 2022

Merge branch 'main' into epolon/performance-improvements

721c69b

mergify bot merged commit 2b35114 into main Jan 12, 2022

mergify bot deleted the epolon/performance-improvements branch January 12, 2022 09:15

iliapolo mentioned this pull request Jan 12, 2022

fix(transliterator): excessive throttling from ECS during reprocessing cdklabs/construct-hub#711

Merged

1 task

Chriscbr mentioned this pull request Jan 18, 2022

Reprocessing can cause ingestion of new packages to be delayed by over an hour cdklabs/construct-hub#715

Open

Chriscbr mentioned this pull request Jan 28, 2022

Decrease test time / build time #445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cache expensive lookups #553

chore: cache expensive lookups #553

iliapolo commented Jan 11, 2022 •

edited

Loading

iliapolo Jan 11, 2022 •

edited

Loading

iliapolo commented Jan 11, 2022 •

edited

Loading

iliapolo Jan 11, 2022

MrArnoldPalmer left a comment

chore: cache expensive lookups #553

chore: cache expensive lookups #553

Conversation

iliapolo commented Jan 11, 2022 • edited Loading

iliapolo Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

iliapolo commented Jan 11, 2022 • edited Loading

iliapolo Jan 11, 2022

Choose a reason for hiding this comment

MrArnoldPalmer left a comment

Choose a reason for hiding this comment

iliapolo commented Jan 11, 2022 •

edited

Loading

iliapolo Jan 11, 2022 •

edited

Loading

iliapolo commented Jan 11, 2022 •

edited

Loading