Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydrate virtual packages #990

Closed
wants to merge 21 commits into from

Conversation

crubier
Copy link
Contributor

@crubier crubier commented Feb 24, 2020

What's the problem this PR addresses?

#973

How did you fix it?

Following #973 (comment)

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

Ok I think I got the virtual-state.yml file content right, it looks like this:

# This file is generated by running "yarn install" inside your project.
# Manual changes might be lost - proceed with caution!

__metadata:
  version: 4

"debug@virtual:81392a248a4eb1062284166bc5b126ea9dbc003b67ae1ded6b6a4bf4e58d3ad86c80934ef78b5b059394fa3051f64f2a5781fbc893f31c635c098e2c47b576c1#npm:^4.1.0":
  version: 4.1.1
  resolution: "debug@virtual:81392a248a4eb1062284166bc5b126ea9dbc003b67ae1ded6b6a4bf4e58d3ad86c80934ef78b5b059394fa3051f64f2a5781fbc893f31c635c098e2c47b576c1#npm:4.1.1"
  dependencies:
    ms: "npm:^2.1.1"
  peerDependencies:
    supports-color: "*"
  peerDependenciesMeta:
    supports-color:
      optional: true
  languageName: node
  linkType: hard

"pkg-1-1@virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#workspace:packages/pkg-1-1":
  version: 0.0.0-use.local
  resolution: "pkg-1-1@virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#workspace:packages/pkg-1-1"
  dependencies:
    pkg-dev-1: "workspace:packages/pkg-dev-1"
    pkg-dev-2: "workspace:packages/pkg-dev-2"
    pkg-dev-3: "workspace:packages/pkg-dev-3"
    pkg-dev-4: "workspace:packages/pkg-dev-4"
    pkg-dev-5: "workspace:packages/pkg-dev-5"
    pkg-dev-6: "workspace:packages/pkg-dev-6"
    react: "npm:^16.12.0"
    react-dom: "virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#npm:^16.12.0"
  peerDependencies:
    react: "*"
    react-dom: "*"
  languageName: unknown
  linkType: soft

....

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

Now I need to know what to do of it aha

@arcanis
Copy link
Member

arcanis commented Feb 26, 2020

I looked into it since I was sure my instructions were missing a few pieces that could be a bit complex - I published my findings here: crubier#2

Basically, the main changes have been that:

  • We can't directly use the originalPackages registry, because it contains the raw data returned by the registry, prior any normalization (like the npm: prefix etc). As a result they aren't the "real" entries found in the storedResolutions array, hence why you were getting assertion errors.

    • The fix is to run a partial resolution before the hydratation. This resolution is special in that it doesn't run the full virtual dependency traversal - leaving that to the hydratation step that follows.
  • Since the virtual references aren't stored in the lockfile (we devirtualize them first), in order to add them back during hydratation, we needed to keep track of which package was having a virtual dependency on which other.

    • This is done through some extra fields in the virtual state file.

@crubier crubier force-pushed the hydrate-virtual-packages branch from 823ed37 to d656142 Compare February 26, 2020 11:05
Fixes the serialization / hydratation
@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

Nice @arcanis ! It is looking good:

  • The results are correct: According to my tests, the state after hydrateVirtualPackages is the same as the state after resolveEverything
  • The duration of yarn install is faster than it was before this MR: I am running the stress test right now, I can see it is faster
  • The duration of yarn run is now super fast in all cases, which was the goal here👍 👍

However, about the yarn install time:

  • I still worry a bit about the duration of the install to be increasing exponentially. I will wait until the end of the stress test to give my final verdict on the performance. But to give an idea: in the stress test, the install duration looks like this:
    • 0.06s with 0 packages (in 0 layers) 💚
    • 0.06s with 10 packages (in 1 layer)💚
    • 0.53s with 60 packages (in 2 layers)💚
    • 1.45s with 110 packages (in 3 layers)💚
    • 20.56s with 130 packages (in 4 layers) 💛
    • 33.88s with 140 packages (in 4 layers) 💔
    • 56.23s with 150 packages (in 4 layers) 💔
    • ... and counting because now the stress test is becoming slow again... Each new package takes 1min to add and there are still 60 of them to go...

Basically the problem is that the current solution (with hydrateVirtualPackages) makes things faster but since there is still some resolution to do, the duration behaviour is still O(exp(n)). Making things faster means that we can go further along in the stress test, but the exponential behaviour means that we still crawl to a halt at some point in the stress test, just later.

Maybe the stress test is a bit too harsh though, I admit. But still, it is a monorepo of 200 packages where some have 100 dependencies. These numbers are high but not astronomical, but the resolution time takes a sharp turn up at some point. I guess we could argue that a 40s yarn install is not too bad in this case.

But the main point which is the big win here is that the time to run yarn run is now super fast in all cases! 👍 👍

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

Here is the curve of install time (vertical, in seconds) vs package number in the stress test (horizontal)

Screenshot 2020-02-26 at 13 03 47

We can see

  • The first 30 packages (20 devDeps + 10 normal deps) of the first layer, which have no dependencies.
  • Then after 30 we have the first packages with 1 layer of deps, the curve stays reasonable
  • Then after 80 we have the first packages with 2 layer of deps, the curve starts climbing
  • Then after 130 we have the first packages with 3 layer of deps, the curve goes wild ! Each new package adds several seconds to the install time...

There is a 5th layer in the stress test... Can't wait to see the curve 😱

Edit:

Same curve with logarithmic scale, to better see the first layers.

Screenshot 2020-02-26 at 13 08 26

@arcanis
Copy link
Member

arcanis commented Feb 26, 2020

To be clear, I didn't expect this PR to improve the install time. The install time explosion is caused by the late deduping inside applyVirtualResolutionMutations that I described in #973 (comment), and which I'm working to fix as part of a separate branch. The changes we've made here will only benefit yarn run until then 🙂

The duration of yarn install is faster than it was before this MR: I am running the stress test right now, I can see it is faster

That's interesting, I actually would expect it to be very slightly slower (the time needed to generate the virtual state file). Hm 🤔

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

You’re right, I think the impact on install time I perceived was just wishful thinking on my part as the early stage of the stress test are fast.

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

This indeed solves the run duration on the stress test repo, it now takes 0.26s instead of 25s 🎉

On our private repo, it took the time down from 200s to 7s. So this is much better and definitely worth it.

But this is actually still too slow as compared to yarn v1 ... yarn v1 was pretty much instant for this, but with v2 we have a 7s lag before each command. For example, when running a general build of our 200 packages, we now waste 1400 seconds just on the hydrate... Another example, the format on save of VSCode on our repo was instant and painless, but now saving a file takes like 8 seconds...

Our virtual-state.yml is 7.6Mb, our yarn.lock is 2.2Mb

I noted that in some commands, that resolution is done twice (like one for the yarn run and then another one for the underlying script resolutions...

I think we need even more optimisations like this... For example, does the .pnp.js need to be modified to take this new file into account? Or is there a way to just serialize the whole resolution state somewhere and just bulk load it on launch (=basically what I though .pnp.js was doing)?

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

Is there a way to easily identify the "culprit" packages that generate a lot of virtual packages?

As I said our virtual-state.yml is pretty big, at 7.6Mb, if there was a way to optimize it by changing a few packages we could probably reduce it.

@arcanis
Copy link
Member

arcanis commented Feb 26, 2020

Or is there a way to just serialize the whole resolution state somewhere and just bulk load it on launch

🤔 Technically if we can serialize / restore storedResolutions, storedDescriptors, and storedPackages (and maybe accessibleLocators + optionalBuilds) right after Project.find, it's enough to then start using the scriptUtils methods.

Maybe using the "canonical way" to restore the virtual links (going through the resolvers, then updating the dependencies as a postprocess pass) was a red herring and we should instead turn virtual-state.yml into install-state.yml (or even json if the intent is to be as fast as possible) and put all the raw data tables inside it? It will certainly be the fastest, but I'm worried it may lead to stale states more often.

(=basically what I though .pnp.js was doing)?

That's what it does, but in a lossy format tailored to the PnP implementation which doesn't contain all the information Yarn needs (for example Yarn needs to keep track of the package checksums, but the PnP loader doesn't care about that).

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

we should instead turn virtual-state.yml into install-state.yml and put all the raw data tables inside it

I will try this in a new branch. It will look similar except that the content and name of the file will be different

@crubier crubier mentioned this pull request Feb 26, 2020
@crubier
Copy link
Contributor Author

crubier commented Feb 27, 2020

I think we can close this MR in favor of #998

@arcanis let me know if you disagree.

@crubier crubier closed this Feb 27, 2020
@larixer
Copy link
Member

larixer commented Jun 24, 2020

@crubier Hi, could you put somewhere the big project from your experiments into GitHub? I'm trying to understand can we reduce size of install-state.gz and for that I need playground. Any hints would be welcome.

@crubier
Copy link
Contributor Author

crubier commented Jul 1, 2020

@larixer Sure, it is there on my colleagues @etiennedupont repo here : https://github.com/etiennedupont/yarn-v2-stress-test

@larixer
Copy link
Member

larixer commented Jul 1, 2020

@crubier Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants