Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reprocessing can cause ingestion of new packages to be delayed by over an hour #715

Open
Chriscbr opened this issue Jan 14, 2022 · 1 comment
Labels
bug Something isn't working effort/epic More than 1 week p1

Comments

@Chriscbr
Copy link
Contributor

If we are running a full reprocessing of all packages on Construct Hub, it's possible that so many fargate tasks will be running that it's impossible for new packages to get processed. (Specifically, the "Orchestration" step function may have a task running to ingest a package, and it may get very unlucky retrying the ECS SubmitTask operation in an exponential backoff loop we have defined).

Fixingn this may require separating the compute for reprocessing and regular ingestion, or adding some way to prioritize new package tasks over reprocessing package tasks, or something else entirely.

Related: #708

@Chriscbr Chriscbr added bug Something isn't working p1 labels Jan 14, 2022
@Chriscbr
Copy link
Contributor Author

Chriscbr commented Jan 18, 2022

Currently having the "reprocessing" workflow as a cron job is necessary for our public instance of Construct Hub and likely desirable for private instances of Construct Hub since it currently serves three purposes as I see it:

  1. re-ingesting packages, which updates per-package metadata based on the latest package tags the operator has configured (we use this to apply CDK type tags and publisher tags, and this configuration gets updated by us whenever new publishers want their packages to be specially tagged on Construct Hub)
  2. re-generating docs to account for latest fixes in jsii and jsii-rosetta and jsii-docgen
  3. re-generating docs so that links to types in external packages will link to the latest versions of the appropriate packages

(1) and (2) don't necessarily justify having this as a daily cron job since we could just manually trigger the workflow whenever it's actually needed. But (3) is something I wasn't aware of until recently. To make this concrete, if someone's construct library has a class, and the API reference of that class references a type from any other library that they have a caret dependency on (e.g. constructs.Construct) -- then the version of the dependency library in the link to depends on what the latest version of the package was when the API reference was generated. So if we never ran reprocessing, you might click a link and get taken to the page for [email protected] when the latest version is actually [email protected]. Since both versions of the dependency are compatible with the original library, both links are technically valid, but I think it would make a subpar experience since APIs often get updated with better docs etc. so we should try to link to the latest available version. Generating a docs in a way that doesn't hard-code package versions may be possible, but it would require pushing a lot of complexity onto the front-end to do semver calculations to determine what package to link to.

So TL;DR I think reprocessing all docs on a daily basis is a pragmatic option for now, but I still think adding some kind of systematic fix using queues (or by creating a separate cluster) for the issue title is desirable long term.

As a short term fix I think it's ok to increase the service limits in our AWS accounts since not all private construct hubs will necessarily host as many libraries + versions as us, and anyone can request these same limit increases on their own AWS accounts.

We also are making changes to improve jsii-docgen performance which should slightly mitigate the problem by reducing how long reprocessing takes (cdklabs/jsii-docgen#553 and cdklabs/jsii-docgen#559).

@ryparker ryparker added the effort/epic More than 1 week label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working effort/epic More than 1 week p1
Projects
None yet
Development

No branches or pull requests

2 participants