Hydrate virtual packages #990

crubier · 2020-02-24T12:33:18Z

What's the problem this PR addresses?

How did you fix it?

crubier · 2020-02-24T15:55:04Z

Ok I think I got the virtual-state.yml file content right, it looks like this:

# This file is generated by running "yarn install" inside your project.
# Manual changes might be lost - proceed with caution!

__metadata:
  version: 4

"debug@virtual:81392a248a4eb1062284166bc5b126ea9dbc003b67ae1ded6b6a4bf4e58d3ad86c80934ef78b5b059394fa3051f64f2a5781fbc893f31c635c098e2c47b576c1#npm:^4.1.0":
  version: 4.1.1
  resolution: "debug@virtual:81392a248a4eb1062284166bc5b126ea9dbc003b67ae1ded6b6a4bf4e58d3ad86c80934ef78b5b059394fa3051f64f2a5781fbc893f31c635c098e2c47b576c1#npm:4.1.1"
  dependencies:
    ms: "npm:^2.1.1"
  peerDependencies:
    supports-color: "*"
  peerDependenciesMeta:
    supports-color:
      optional: true
  languageName: node
  linkType: hard

"pkg-1-1@virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#workspace:packages/pkg-1-1":
  version: 0.0.0-use.local
  resolution: "pkg-1-1@virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#workspace:packages/pkg-1-1"
  dependencies:
    pkg-dev-1: "workspace:packages/pkg-dev-1"
    pkg-dev-2: "workspace:packages/pkg-dev-2"
    pkg-dev-3: "workspace:packages/pkg-dev-3"
    pkg-dev-4: "workspace:packages/pkg-dev-4"
    pkg-dev-5: "workspace:packages/pkg-dev-5"
    pkg-dev-6: "workspace:packages/pkg-dev-6"
    react: "npm:^16.12.0"
    react-dom: "virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#npm:^16.12.0"
  peerDependencies:
    react: "*"
    react-dom: "*"
  languageName: unknown
  linkType: soft

....

crubier · 2020-02-24T15:55:16Z

Now I need to know what to do of it aha

arcanis · 2020-02-26T10:34:25Z

I looked into it since I was sure my instructions were missing a few pieces that could be a bit complex - I published my findings here: crubier#2

Basically, the main changes have been that:

We can't directly use the originalPackages registry, because it contains the raw data returned by the registry, prior any normalization (like the npm: prefix etc). As a result they aren't the "real" entries found in the storedResolutions array, hence why you were getting assertion errors.
- The fix is to run a partial resolution before the hydratation. This resolution is special in that it doesn't run the full virtual dependency traversal - leaving that to the hydratation step that follows.
Since the virtual references aren't stored in the lockfile (we devirtualize them first), in order to add them back during hydratation, we needed to keep track of which package was having a virtual dependency on which other.
- This is done through some extra fields in the virtual state file.

Fixes the serialization / hydratation

crubier · 2020-02-26T12:06:38Z

Nice @arcanis ! It is looking good:

The results are correct: According to my tests, the state after hydrateVirtualPackages is the same as the state after resolveEverything
The duration of yarn install is faster than it was before this MR: I am running the stress test right now, I can see it is faster
The duration of yarn run is now super fast in all cases, which was the goal here👍 👍

However, about the yarn install time:

I still worry a bit about the duration of the install to be increasing exponentially. I will wait until the end of the stress test to give my final verdict on the performance. But to give an idea: in the stress test, the install duration looks like this:
- 0.06s with 0 packages (in 0 layers) 💚
- 0.06s with 10 packages (in 1 layer)💚
- 0.53s with 60 packages (in 2 layers)💚
- 1.45s with 110 packages (in 3 layers)💚
- 20.56s with 130 packages (in 4 layers) 💛
- 33.88s with 140 packages (in 4 layers) 💔
- 56.23s with 150 packages (in 4 layers) 💔
- ... and counting because now the stress test is becoming slow again... Each new package takes 1min to add and there are still 60 of them to go...

Basically the problem is that the current solution (with hydrateVirtualPackages) makes things faster but since there is still some resolution to do, the duration behaviour is still O(exp(n)). Making things faster means that we can go further along in the stress test, but the exponential behaviour means that we still crawl to a halt at some point in the stress test, just later.

Maybe the stress test is a bit too harsh though, I admit. But still, it is a monorepo of 200 packages where some have 100 dependencies. These numbers are high but not astronomical, but the resolution time takes a sharp turn up at some point. I guess we could argue that a 40s yarn install is not too bad in this case.

But the main point which is the big win here is that the time to run yarn run is now super fast in all cases! 👍 👍

crubier · 2020-02-26T13:04:23Z

Here is the curve of install time (vertical, in seconds) vs package number in the stress test (horizontal)

We can see

The first 30 packages (20 devDeps + 10 normal deps) of the first layer, which have no dependencies.
Then after 30 we have the first packages with 1 layer of deps, the curve stays reasonable
Then after 80 we have the first packages with 2 layer of deps, the curve starts climbing
Then after 130 we have the first packages with 3 layer of deps, the curve goes wild ! Each new package adds several seconds to the install time...

There is a 5th layer in the stress test... Can't wait to see the curve 😱

Edit:

Same curve with logarithmic scale, to better see the first layers.

arcanis · 2020-02-26T13:10:59Z

To be clear, I didn't expect this PR to improve the install time. The install time explosion is caused by the late deduping inside applyVirtualResolutionMutations that I described in #973 (comment), and which I'm working to fix as part of a separate branch. The changes we've made here will only benefit yarn run until then 🙂

The duration of yarn install is faster than it was before this MR: I am running the stress test right now, I can see it is faster

That's interesting, I actually would expect it to be very slightly slower (the time needed to generate the virtual state file). Hm 🤔

crubier · 2020-02-26T13:33:23Z

You’re right, I think the impact on install time I perceived was just wishful thinking on my part as the early stage of the stress test are fast.

crubier · 2020-02-26T15:02:40Z

This indeed solves the run duration on the stress test repo, it now takes 0.26s instead of 25s 🎉

On our private repo, it took the time down from 200s to 7s. So this is much better and definitely worth it.

But this is actually still too slow as compared to yarn v1 ... yarn v1 was pretty much instant for this, but with v2 we have a 7s lag before each command. For example, when running a general build of our 200 packages, we now waste 1400 seconds just on the hydrate... Another example, the format on save of VSCode on our repo was instant and painless, but now saving a file takes like 8 seconds...

Our virtual-state.yml is 7.6Mb, our yarn.lock is 2.2Mb

I noted that in some commands, that resolution is done twice (like one for the yarn run and then another one for the underlying script resolutions...

I think we need even more optimisations like this... For example, does the .pnp.js need to be modified to take this new file into account? Or is there a way to just serialize the whole resolution state somewhere and just bulk load it on launch (=basically what I though .pnp.js was doing)?

crubier · 2020-02-26T15:13:26Z

Is there a way to easily identify the "culprit" packages that generate a lot of virtual packages?

As I said our virtual-state.yml is pretty big, at 7.6Mb, if there was a way to optimize it by changing a few packages we could probably reduce it.

arcanis · 2020-02-26T15:17:11Z

Or is there a way to just serialize the whole resolution state somewhere and just bulk load it on launch

🤔 Technically if we can serialize / restore storedResolutions, storedDescriptors, and storedPackages (and maybe accessibleLocators + optionalBuilds) right after Project.find, it's enough to then start using the scriptUtils methods.

Maybe using the "canonical way" to restore the virtual links (going through the resolvers, then updating the dependencies as a postprocess pass) was a red herring and we should instead turn virtual-state.yml into install-state.yml (or even json if the intent is to be as fast as possible) and put all the raw data tables inside it? It will certainly be the fastest, but I'm worried it may lead to stale states more often.

(=basically what I though .pnp.js was doing)?

That's what it does, but in a lossy format tailored to the PnP implementation which doesn't contain all the information Yarn needs (for example Yarn needs to keep track of the package checksums, but the PnP loader doesn't care about that).

crubier · 2020-02-26T15:26:40Z

we should instead turn virtual-state.yml into install-state.yml and put all the raw data tables inside it

I will try this in a new branch. It will look similar except that the content and name of the file will be different

crubier · 2020-02-27T10:21:29Z

I think we can close this MR in favor of #998

@arcanis let me know if you disagree.

larixer · 2020-06-24T12:03:36Z

@crubier Hi, could you put somewhere the big project from your experiments into GitHub? I'm trying to understand can we reduce size of install-state.gz and for that I need playground. Any hints would be welcome.

crubier · 2020-07-01T17:50:53Z

@larixer Sure, it is there on my colleagues @etiennedupont repo here : https://github.com/etiennedupont/yarn-v2-stress-test

larixer · 2020-07-01T18:08:31Z

@crubier Awesome, thank you!

crubier added 3 commits February 24, 2020 12:17

Add virtual state in options

f8029cf

Change some stuff

d85c40b

Update doc

d0771b3

crubier mentioned this pull request Feb 24, 2020

[Bug] Extremely slow yarn run xxxx commands on large monorepo #973

Closed

crubier added 6 commits February 24, 2020 14:51

Tests

45311ac

again

bb92de8

ok

8c53164

Come on

dbfd9dc

Sneaky Typescript

dca63ab

Made generateVirtualState look like generateLockFile

2f1d184

crubier and others added 10 commits February 24, 2020 16:20

Add tim

a66e6e8

time

ca7eb66

find timing

e6d309d

Fix

88d9bd9

Generate the right virtual state file

ca0c227

Fix

e639b46

Compare old vs new option

821647a

No battery left on laptop, think Im almost done

f969b66

Trying to fix

d656142

Fixes the serialization / hydratation

1c1e104

crubier force-pushed the hydrate-virtual-packages branch from 823ed37 to d656142 Compare February 26, 2020 11:05

Merge pull request #2 from yarnpkg/mael/crubier-work

6a4187e

Fixes the serialization / hydratation

Add timing

dd7861e

crubier mentioned this pull request Feb 26, 2020

Install state file #998

Merged

crubier closed this Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hydrate virtual packages #990

Hydrate virtual packages #990

crubier commented Feb 24, 2020

crubier commented Feb 24, 2020

crubier commented Feb 24, 2020

arcanis commented Feb 26, 2020

crubier commented Feb 26, 2020 •

edited

Loading

crubier commented Feb 26, 2020 •

edited

Loading

arcanis commented Feb 26, 2020

crubier commented Feb 26, 2020

crubier commented Feb 26, 2020 •

edited

Loading

crubier commented Feb 26, 2020

arcanis commented Feb 26, 2020 •

edited

Loading

crubier commented Feb 26, 2020

crubier commented Feb 27, 2020

larixer commented Jun 24, 2020

crubier commented Jul 1, 2020

larixer commented Jul 1, 2020

Hydrate virtual packages #990

Hydrate virtual packages #990

Conversation

crubier commented Feb 24, 2020

crubier commented Feb 24, 2020

crubier commented Feb 24, 2020

arcanis commented Feb 26, 2020

crubier commented Feb 26, 2020 • edited Loading

crubier commented Feb 26, 2020 • edited Loading

arcanis commented Feb 26, 2020

crubier commented Feb 26, 2020

crubier commented Feb 26, 2020 • edited Loading

crubier commented Feb 26, 2020

arcanis commented Feb 26, 2020 • edited Loading

crubier commented Feb 26, 2020

crubier commented Feb 27, 2020

larixer commented Jun 24, 2020

crubier commented Jul 1, 2020

larixer commented Jul 1, 2020

crubier commented Feb 26, 2020 •

edited

Loading

crubier commented Feb 26, 2020 •

edited

Loading

crubier commented Feb 26, 2020 •

edited

Loading

arcanis commented Feb 26, 2020 •

edited

Loading