-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: cache expensive lookups #553
Conversation
const cached = this.parentModulesCache.get(moduleLike.fqn); | ||
if (cached) return cached; | ||
|
||
const types = moduleLike.types; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the main culprit. Apparently doing Object.values
on a map with 8000 entries takes roughly 3ms, and this used to happen twice for each typeReference
transpilation.
Specifically for non typescript languages this is exacerbated because of parameter expansion.
Some more analysis details. Before this PR, here is how the flamegraph looked like: We can see the After this PR: We can see @RomainMuller Do you think we can improve something there, or maybe run |
// if the type is in a submodule, the submodule name is the first | ||
// part of the namespace. we construct the full submodule fqn and search for it. | ||
const submoduleFqn = `${type.assembly.name}.${type.namespace.split('.')[0]}`; | ||
const submodules = type.assembly.submodules.filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the flame graph, the assembly.submodules
call is also pretty expensive. The submodules cache here reduced another 15 seconds from the render time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! 🔥 🔥 🔥
#711) During the reprocessing workflow, step functions tries to start a burst of 60,000 (current number of package versions) ECS tasks. Since our account limit is only 1000 parallel tasks, we need to apply a retry policy so the throttled tasks don't end up in the DLQ. Currently, our retry policy allows for a total wait time of roughly 2.5 hours. Lets do some math to see if this is enough. Since tasks also have boot time, we don't really run 1000 in parallel. In practice what we normally see is: ![Screen Shot 2022-01-12 at 4 12 24 PM](https://user-images.githubusercontent.com/1428812/149156438-9ba5e844-fa62-4294-9760-92887f6825f5.png) So for simplicity sake lets assume 500 parallel tasks. If every task takes about 2 minutes (empirically and somewhat based on `jsii-docgen` test timeouts) we are able to process 1000 tasks in 4 minutes. This means that in order to process 60,000 tasks, we need 4 hours. The current retry policy of 2.5 hours allows us to process only about 35,000 tasks. And indeed, most recent execution of the workflow resulted in the remaining 25,000 tasks being sent to the DLQ. The retry policy implemented in this PR gives us 7 hours. ## TODO - [x] 5 hours might still a bit too close. Run the reprocess workflow again to see if the numbers have changed following cdklabs/jsii-docgen#553. Follow up: `jsii-docgen` improvements did make it better but not enough to put a significant dent. I've updated the PR to give us 7 hours. Fixes #708 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Currently, ingesting packages with a large amount of submodules and/or types can take a very long time.
For example, running the following:
Will result in:
With this PR, by adding a couple of caches, this is reduced to:
Fixes cdklabs/construct-hub#664