Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Extremely slow yarn run xxxx commands on large monorepo #973

Closed
crubier opened this issue Feb 20, 2020 · 38 comments
Closed

[Bug] Extremely slow yarn run xxxx commands on large monorepo #973

crubier opened this issue Feb 20, 2020 · 38 comments
Labels
bug Something isn't working

Comments

@crubier
Copy link
Contributor

crubier commented Feb 20, 2020

The bug was explained there by my colleague @etiennedupont , but maybe it should be reported here as it concerns v2 ?

https://github.com/yarnpkg/yarn/issues/7917

Bug description

When running scripts defined in package.json using yarn run xxx on a large monorepo, there is an extremely long delay before the command is actually run. This delay is always the same and seems to depend on the size of the monorepo.

Command

Here is a simple example:

In package.json

{
  "scripts": {
    "echo": "echo ok"
  }
}

Then in shell run:

> yarn echo

.... WAITS FOR 265s .....

ok

Using time to confirm the duration:

> time yarn echo
ok
yarn echo  264.68s user 1.33s system 99% cpu 4:26.01 total

What is the current behavior?

Yarn does something using 100% of a CPU core for 265s (on my repo and machine) before actually running the command.

What is the expected behavior?

Yarn runs the command instantly

Steps to Reproduce

  • Have a large monorepo with 176packages in the workspace.
  • yarn install
  • Run any yarn run xxxx command in any of the packages folder.
  • Environment

Node Version: 10.15.3
Yarn v1 Version: 2.0.0-rc.29.git.20200218.926781b7
OS and version: MacOS Catalina

@crubier crubier added the bug Something isn't working label Feb 20, 2020
@crubier crubier changed the title [Bug] [Bug] Extremely slow yarn run xxxx commands on large monorepo Feb 20, 2020
@arcanis
Copy link
Member

arcanis commented Feb 20, 2020

What does the workspaces key look like? Is it a glob pattern? Can you try replacing it by full paths?

@crubier
Copy link
Contributor Author

crubier commented Feb 20, 2020

The workspaces already looks like this:

"workspaces": {
    "packages": [
      "packages/foo/toto",
      "packages/foo/tata",
      "packages/foo/titi",
      "packages/foo/tutu",
      "packages/bar/tonton",
      "packages/bar/toutou",
      ... 150 others
    ]
}

@RaynalHugo
Copy link
Contributor

Hi @arcanis

I am working with @crubier and @etiennedupont to migrate our repo from yarn v1 + pnp to yarn v2.

Would you have any idea where we should start debugging from? Running any yarn run ... command hangs without letting us know what's currently happening.

Our guess is that we should look into the source code of v2 using yarn set version from sources --no-minify and put some classic console.logs everywhere. If you have any hints to share to lead us in the correct direction, this would be greatly appreciated.

Thanks :)

@arcanis
Copy link
Member

arcanis commented Feb 20, 2020

Hey @RaynalHugo! I took a quick look at generating a huge monorepo with 200 packages, but no luck - my run time was mostly unaffected (I used the following script inside a project configured with a packages/* glob pattern:).

for x in {1..200} ; do
    (mkdir -p packages/pkg-$x && cd packages/pkg-$x && yarn init)
done

I think the best would be to use --no-minify as you guessed, and put some timing statements around setupWorkspaces and resolveEverything. Depending on the results other places may be good profiling targets as well, such as initializePackageEnvironments.

@etiennedupont
Copy link

Hi @arcanis,

after going through some profiling, it looks like the function that causes this delay is runBinary.

Screenshot 2020-02-20 at 15 04 30

So we profiled the inner call that is made through child_process, which gives that:

Screenshot 2020-02-20 at 15 17 43

Indeed, most of the time is spent in resolveEverything.

@etiennedupont
Copy link

There seem to be two phases in the resolveEverything:

The first phase last 6 seconds (which is quite long already), with a profile like this:

Screenshot 2020-02-20 at 15 23 09

The second phase is even longer, lasting 25 seconds, with very shallow stacks:

Screenshot 2020-02-20 at 15 23 29

@etiennedupont
Copy link

Here is the bottom up view of this profile:

Screenshot 2020-02-20 at 15 26 54

The AsyncFunctionAwaitResolveClosure in resolveEverything takes all the time

@arcanis
Copy link
Member

arcanis commented Feb 20, 2020

I suspect there's a dependency cycle somewhere that takes a long time to resolve - by any chance, could you provide the workspaces' package.json files (stripped from anything that isn't needed for the resolution)?

Otherwise, I'd suggest to add console.group / console.groupEnd inside resolvePeerDependencies to see whether there's a particularly deep recursion. That said, to reach 260s without triggering a stack overflow I really don't know what the tree might look like - hence why the files would be useful 🤔

@etiennedupont
Copy link

We did the group thing, the stacks don't seem that deep, but the stream is neverending, there are millions of lines...

          resolvePeerDependencies
        resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
    resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
      resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
          resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
                resolvePeerDependencies
                resolvePeerDependencies
          resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
            resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
                resolvePeerDependencies
                resolvePeerDependencies
                resolvePeerDependencies
                resolvePeerDependencies
            resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
              resolvePeerDependencies
      resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
        resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
          resolvePeerDependencies
            resolvePeerDependencies
            resolvePeerDependencies
    resolvePeerDependencies
      resolvePeerDependencies
... continues for 10 minutes...

Screenshot 2020-02-20 at 15 44 21

@arcanis
Copy link
Member

arcanis commented Feb 20, 2020

To give you some context, the way the resolvePeerDependency function works, it traverses the tree to figure out for each node N what are the dependencies it needs to extract from its parent P to satisfy the peer dependencies of N - then once it's done, it recurses inside the newly created package N2 (which doesn't have peer dependencies anymore) in order to let its transitive dependencies inherit their peer dependencies as well.

I'm not entirely sure of the circumstances under which you'd end up with a huge tree - there's likely a mix of peer dependencies and regular dependencies that trigger a catastrophic expansion. My main guess is that something somewhere (possibly in multiple places) is listed as a regular dependency instead of being a peer dependency, causing Yarn to dig into it and generating those huge graphs. If you print parentLocator.name in your console.group you might get a clue by seeing which package is referenced the most.

@etiennedupont
Copy link

etiennedupont commented Feb 20, 2020

We did that and ran:

node .yarn/releases/yarn-sources.js echo > parentLocators.txt
sort parentLocators.txt | uniq -c | sort

It gives these stats: the most imported package is @babel/helper-plugin-utils, which is imported... 39163 times... I guess this is a lot even for a monorepo of 160 packages.

 360 parentLocator: @jest/[email protected]
 361 parentLocator: @null/[email protected]
 364 parentLocator: @babel/[email protected]
 364 parentLocator: @storybook/[email protected]
 365 parentLocator: @null/[email protected]
 366 parentLocator: @null/[email protected]
 368 parentLocator: @null/[email protected]
 376 parentLocator: @null/[email protected]
 381 parentLocator: @babel/[email protected]
 381 parentLocator: @babel/[email protected]
 381 parentLocator: @babel/[email protected]
 381 parentLocator: @babel/[email protected]
 381 parentLocator: @null/[email protected]
 382 parentLocator: @babel/[email protected]
 384 parentLocator: @null/[email protected]
 396 parentLocator: @null/[email protected]
 414 parentLocator: @null/[email protected]
 419 parentLocator: @null/[email protected]
 466 parentLocator: @babel/[email protected]
 490 parentLocator: @null/[email protected]
 514 parentLocator: @babel/[email protected]
 514 parentLocator: @babel/[email protected]
 542 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @null/[email protected]
 544 parentLocator: @storybook/[email protected]
 544 parentLocator: @storybook/[email protected]
 544 parentLocator: @types/[email protected]
 545 parentLocator: @null/[email protected]
 545 parentLocator: @null/[email protected]
 545 parentLocator: @null/[email protected]
 545 parentLocator: @null/[email protected]
 545 parentLocator: @null/[email protected]
 546 parentLocator: @null/[email protected]
 547 parentLocator: @null/[email protected]
 547 parentLocator: @null/[email protected]
 549 parentLocator: @null/[email protected]
 554 parentLocator: @null/[email protected]
 555 parentLocator: @null/[email protected]
 555 parentLocator: @null/[email protected]
 558 parentLocator: @null/[email protected]
 562 parentLocator: @babel/[email protected]
 562 parentLocator: @null/[email protected]
 563 parentLocator: @babel/[email protected]
 565 parentLocator: @babel/[email protected]
 575 parentLocator: @null/[email protected]
 591 parentLocator: @null/[email protected]
 593 parentLocator: @null/[email protected]
 598 parentLocator: @null/[email protected]
 607 parentLocator: @null/[email protected]
 610 parentLocator: @babel/[email protected]
 610 parentLocator: @babel/[email protected]
 611 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @babel/[email protected]
 613 parentLocator: @null/[email protected]
 613 parentLocator: @null/[email protected]
 614 parentLocator: @babel/[email protected]
 614 parentLocator: @babel/[email protected]
 615 parentLocator: @babel/[email protected]
 616 parentLocator: @null/[email protected]
 619 parentLocator: @babel/[email protected]
 620 parentLocator: @babel/[email protected]
 623 parentLocator: @babel/[email protected]
 637 parentLocator: @types/[email protected]
 652 parentLocator: @babel/[email protected]
 657 parentLocator: @babel/[email protected]
 679 parentLocator: @null/[email protected]
 726 parentLocator: @null/[email protected]
 726 parentLocator: @null/[email protected]
 726 parentLocator: @storybook/[email protected]
 727 parentLocator: @null/[email protected]
 733 parentLocator: @null/[email protected]
 747 parentLocator: @null/[email protected]
 873 parentLocator: @null/[email protected]
 923 parentLocator: @null/[email protected]
 926 parentLocator: @null/[email protected]
1020 parentLocator: @null/[email protected]
1033 parentLocator: @babel/[email protected]
1034 parentLocator: @babel/[email protected]
1073 parentLocator: @babel/[email protected]
1101 parentLocator: @null/[email protected]
1143 parentLocator: @babel/[email protected]
1166 parentLocator: @null/[email protected]
1226 parentLocator: @babel/[email protected]
1227 parentLocator: @babel/[email protected]
1228 parentLocator: @babel/[email protected]
1230 parentLocator: @babel/[email protected]
1304 parentLocator: @babel/[email protected]
1359 parentLocator: @null/[email protected]
1494 parentLocator: @babel/[email protected]
1512 parentLocator: @babel/[email protected]
1633 parentLocator: @storybook/[email protected]
1648 parentLocator: @babel/[email protected]
1649 parentLocator: @babel/[email protected]
1844 parentLocator: @null/[email protected]
1909 parentLocator: @null/[email protected]
2070 parentLocator: @babel/[email protected]
2177 parentLocator: @storybook/[email protected]
2190 parentLocator: @null/[email protected]
2260 parentLocator: @babel/[email protected]
2263 parentLocator: @babel/[email protected]
2359 parentLocator: @null/[email protected]
2360 parentLocator: @null/[email protected]
2452 parentLocator: @babel/[email protected]
2452 parentLocator: @null/[email protected]
2461 parentLocator: @babel/[email protected]
2641 parentLocator: @null/[email protected]
2722 parentLocator: @null/[email protected]
2722 parentLocator: @storybook/[email protected]
2928 parentLocator: @null/[email protected]
3065 parentLocator: @babel/[email protected]
3538 parentLocator: @null/[email protected]
3538 parentLocator: @storybook/[email protected]
3624 parentLocator: @emotion/[email protected]
3626 parentLocator: @null/[email protected]
3709 parentLocator: @emotion/[email protected]
3720 parentLocator: @null/[email protected]
3721 parentLocator: @emotion/[email protected]
3829 parentLocator: @null/[email protected]
3905 parentLocator: @null/[email protected]
3998 parentLocator: @null/[email protected]
4000 parentLocator: @emotion/[email protected]
4002 parentLocator: @emotion/[email protected]
4193 parentLocator: @emotion/[email protected]
4194 parentLocator: @emotion/[email protected]
4445 parentLocator: @null/[email protected]
4628 parentLocator: @storybook/[email protected]
4899 parentLocator: @types/[email protected]
4901 parentLocator: @reach/[email protected]
5080 parentLocator: @null/[email protected]
5261 parentLocator: @storybook/[email protected]
5325 parentLocator: @null/[email protected]
5456 parentLocator: @null/[email protected]
5479 parentLocator: @null/[email protected]
5898 parentLocator: @storybook/[email protected]
6013 parentLocator: @null/[email protected]
6757 parentLocator: @null/[email protected]
7247 parentLocator: @emotion/[email protected]
7445 parentLocator: @null/[email protected]
7713 parentLocator: @emotion/[email protected]
7713 parentLocator: @emotion/[email protected]
8399 parentLocator: @null/[email protected]
8982 parentLocator: @storybook/[email protected]
9710 parentLocator: @null/[email protected]
12164 parentLocator: @null/[email protected]
12385 parentLocator: @null/[email protected]
14656 parentLocator: @babel/[email protected]
17378 parentLocator: @null/[email protected]
39163 parentLocator: @babel/[email protected]

@etiennedupont
Copy link

No idea what is wrong in our package structure to be honest... We will try to make a fake repo to replicate...

@arcanis
Copy link
Member

arcanis commented Feb 20, 2020

That would be perfect. Clearly Yarn should better detect and surface the problem, whatever it is, and a repro will certainly be very useful to do that! Thanks for your help 🙏

@etiennedupont
Copy link

etiennedupont commented Feb 20, 2020

We just created this repo to reproduce our package structure.

https://github.com/etiennedupont/yarn-v2-stress-test

This reproduces the bug.

Run time yarn echo in that repo (before finishing installing all packages) gets us:

time yarn echo
ok
yarn echo  23.54s user 1.40s system 134% cpu 18.508 total

The structure of the repo is similar, however our packages do not depend on that many of our other packages (our package typically depend on at most 10 of our other packages, while for example pkg-4-1 depends on 120 other packages here). But I believe this difference is not that relevant here. We will try after adjusting this but I think the conclusion would be similar.

@crubier
Copy link
Contributor Author

crubier commented Feb 21, 2020

I did several tests on the repo above:

  • 🔴 When packages all have peerDependencies=react react-dom and devDependencies=typescript, things are insanely slow: The installs take more and more time (first installs <1s, last installs > 50s), and the run takes more than 60s.
  • ✅ When packages all have peerDependencies= and devDependencies= , things are ok: the installs take at most 5s in the end, and the run is quite fast.
  • 🔴When packages all have peerDependencies=react react-dom and devDependencies= things are insanely slow: The installs take more and more time (first installs <1s, last installs > 50s), and the run takes more than 60s.
  • ✅ When packages all have peerDependencies= and devDependencies=typescript, things are ok: the installs take at most 5s in the end, and the run is quite fast.

@crubier
Copy link
Contributor Author

crubier commented Feb 21, 2020

Ok, so it seems that reproduction is simple:

  • A large-ish and deep-ish monorepo (like 4 layers of 50 packages, where each package of a layer depends on many packages of the previous layers)
  • Packages have a few common peerDependencies (like react react-dom in this example here)

Then there is a combinatoric thing at work which means that .pnp.js contains an exponentially large amount of virtual:xxx resolutions and the whole thing blows up epicly. The time to install or run a script seems to be in O(exp(n))

When the peerDependencies are not here, the .pnp.js does not contain a single virtual:xxx resolution and things are ok. The time to install or run a script seems to be in O(n)

@crubier
Copy link
Contributor Author

crubier commented Feb 21, 2020

@arcanis here are our conclusions, Maybe you can:

  • Either have indication about how to change our packages to prevent this. (Maybe move react and react-dom as normal dependencies of our packages instead of peerDependencies? Would this cause additional problems?)
  • Or maybe there is a way to fix yarn itself to prevent this exponential behavior?

Thank you so much!

@arcanis
Copy link
Member

arcanis commented Feb 21, 2020

Thanks for the details! I think I know the place where we're missing an optimization (or rather, we optimize it too late in the process and the CPU is going haywire before reaching it). The solution won't be easy to implement but I think it should be doable on our side.

Then there is a combinatoric thing at work which means that .pnp.js contains an exponentially large amount of virtual:xxx resolutions

What's the actual size of the .pnp.js btw? Given that the optimization actually happens (but too late), I'd expect the size on disk to be fairly reasonable. For example, looking at the stress test repository you shared, I see exactly one virtual instance per package, which is the expected result.

@crubier
Copy link
Contributor Author

crubier commented Feb 21, 2020

Here a few numbers from my latest tries:

On our repo, after moving all our peerDeps to normal Deps (= basically putting react as dep instead of peerDep):

  • The time to launch a script went down from 400s to 90s
  • The time to yarn install went down from 160s to 50s
  • The size of our .pnp.js went down from 14.5Mb to 7.3Mb

The problem is that we depend on third party packages who themselves have peerDependencies, so removing them on our side is not enough, the .pnp.js is still full of virtual resolutions.

grep -o -i virtual: .pnp.js | wc -l
39058

There are still 39058 instances of virtual resolutions in .pnp.js, which is not great, not terrible.

What is the optimization supposed to look like?

One thing is sure, .pnp.js does not grow as exponentially as the time to run the scripts. But I still think that it is too big. In particular, we paid attention to have consistent versions in our dependencies, so we have "only" 5069 packages in our transitive dependencies (5069 items in .yarn/cache with typically ~2 versions per package name).

Given our numbers, I would expect an actually optimal .pnp.js to have roughly the following size:

5069 packages * 15 deps/package * 1 line of code/package * 40 bytes/line of code = 3Mb

So not too far off actually. The size of .pnp.js does not seem to be the main problem here indeed.

@arcanis
Copy link
Member

arcanis commented Feb 21, 2020

What is the optimization supposed to look like? One thing is sure, .pnp.js does not grow as exponentially as th time to run the scripts.

Conceptually we traverse and expand the whole tree, resolving all peer dependencies as we go. Then once we are done, we check which ones have been resolved to the same set of dependencies, and remove the duplicate.

The problem with this approach is that even though the duplicate are trimmed they still contribute to the tree traversal, hence the catastrophic expansion problem. We should instead check for duplicate as we traverse the tree, and avoid recursing into nodes that have the same dependency sets as another virtual package we already traversed.

Of course it's not as simple - the way peer dependencies work is that children mutate their direct parents, so I still need to wrap my head around that to make sure the traversal-time optimization will yield sound results in every case - I'm ~70% sure there's a weird non-obvious edge case hidden somewhere, so most of the work is to find it 😄

@crubier
Copy link
Contributor Author

crubier commented Feb 21, 2020

I see. Indeed this sounds like the way to go.

Still, a question I have is what is done differently in yarn v2 as compared to yarn v1 ?

On our use case, yarn v1 pnp takes around 8s to launch a script, while it takes 200s with yarn v2 and 90s with yarn v2 and no peerDeps in our repo.

My question would be: Maybe it's fine to not optimize the behaviour you are talking about, but could there not be a way to "cache" the result more agressively once and for all after or during install so that these computations do not have to be run everytime we yarn run a script ?

I would not mind my install time to be slightly longer, if my yarn run where instant afterwards.

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

My message above may not be clear, so to rephrase it simply:

There are two problems here:

  1. The peerDependency resolution algorithm is not optimal, as @arcanis stated
  2. The peerDependency resolution result is not written directly in the .pnp.js file. Reading .pnp.js is not enough and some resolutions still have to be done at runtime.

My point is that Problem 1 is not a big deal, but Problem 2 is the actual showstopper here. Solving Problem 2 would make Problem 1 irrelevant. Ideally, nothing else than reading the .pnp.js file should have to be done at runtime.

So my question would be: Is there anything preventing solving problem 2? Is it strictly necessary that some resolution has to be done at runtime rather than install time? In other words is it impossible to serialize in a JS file the full dependency tree including resolved peerDependencies?

@arcanis
Copy link
Member

arcanis commented Feb 24, 2020

Yep, I'd like to improve the run performances as they clearly become the bottleneck when you get rid of most installs (even the parse time becomes quite important!).

The reason why we have the resolution pass is that otherwise we would need to store all the virtual packages not only inside the .pnp.js (it's actually already the case), but also inside the yarn.lock file. While doable, it used to make it significantly harder to review the file (since one package was listed many times), so we stopped doing that.

Maybe we could try again with a better format ... possibly by doing something like this:

"foo@npm:1.0.0":
  dependencies:
    a: 1.0.0
  peerDependencies:
    b: "*"

"foo@virtual:abcdef":
  virtualOf: "foo@npm:1.0.0"
  peers:
    b: 1.0.0

Still, it's quite a lot of data. An alternative would be to store it inside a separate file (.yarn/virtual-state.yml?) which projects could choose to check-in or not (and that would still be covered by the --immutable flag).

Still, a question I have is what is done differently in yarn v2 as compared to yarn v1 ?

If I remember correctly, the v1 implementation was far less correct than the v2 - in particular, I think that postinstall scripts executed inside the context of a virtual package were losing access to their peer dependencies, because Yarn wasn't able to detect which variant was being built.

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

I didn't know humans were supposed to review yarn.lock files aha, ours is 45,000 lines.

Indeed, I think yarn v2 is actually the first-ever fully-correct package manager and this is a big deal and much appreciated.

The solution of having .yarn/virtual-state.yml looks very good to me, and as I said it would totally solve our problem. Also, it might actually be the only way to fully solve this problem: I am not sure that optimizing the resolution pass algorithm would do the job. Right now it can take up to 400s, even making it 100 times faster would still be considered quite slow, taking 4s to run a script that sometimes completes in 2s.

So conclusion: I give a big 👍 to the .yarn/virtual-state.yml , it seems to be the last show stopper for us to adopt yarn v2 in production since everything now works as expected.

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

@arcanis regarding the .yarn/virtual-state.yml idea, you got us excited! So here are a few additional questions:

  • Is it something you would work on soon, what would be the timeline without action on our part?
  • Otherwise, do you think this could be implemented as a yarn plugin?
  • What would be the complexity of this development? I have a feeling it could be done with a few well-placed lines of code that we could work on on our side.

Best

@arcanis
Copy link
Member

arcanis commented Feb 24, 2020

I didn't know humans were supposed to review yarn.lock files aha, ours is 45,000 lines.

For internal lockfiles it's not so much required, but for open-source ones it's important to make sure the resolutions point to the expected location (otherwise I could make a PR adding lodash, except that the lockfile would actually point to my-malicious-package).

Is it something you would work on soon, what would be the timeline without action on our part?

I think I need to work on the algorithm performances first, because it's less likely someone else will be able to efficiently navigate this part of the code. Given my other wips, I'd say at least a few weeks before I can implement it - so any help would be much appreciated! 😃

Otherwise, do you think this could be implemented as a yarn plugin?

It would probably be better to implement this in the core - it's unlikely someone would want a different implementation, so a hook just for that would be awkwardly specific. Plus the experience won't be worse because of it, so better ship it by default.

What would be the complexity of this development? I have a feeling it could be done with a few well-placed lines of code that we could work on on our side.

I don't think it would be very complex - basically what I think would need to be done on the top of my head:

  • The persistLockfile function would have to be extended to generate the virtual state file as well as the lockfile.

  • A new function (hydrateVirtualPackages?) would have to be added to the Project instance - it would load the data from the virtual state file (and potentially fallback on the regular resolution if the file isn't found? I'm not certain about this 🤔). The implementation would look like setupResolutions, except that we would reuse the existing originalPackages as much as possible (cf the shortened format I mentioned).

  • The yarn run function (and potentially others, you can find them by grepping lockfileOnly: true) would need to be modified to reference hydrateVirtualPackages instead of calling the resolver manually.

  • The documentation would have to be updated to reference this in the section where we detail the gitignore patterns (the virtual state would have to be ignored when not using zero installs).

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

Ok I'm giving it a shot there #990

I just have one question: What should I persist in the virtual state file?The Project instance contains a lot of Maps (Like storedResolutions, storedDescriptors, storedPackages and so on). What are the ones I should save and restore in this file?

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

When I think about it, I still don't understand why .pnp.js is not enough to run yarn run. Why does yarn run need anything that is not in .pnp.js? My understanding was that .pnp.js was everything that was needed at runtime to have what we need. I guess I'm missing something here, so it's difficult to code without getting the big picture.

@arcanis
Copy link
Member

arcanis commented Feb 24, 2020

First, a bit of context: our infrastructure is made to support JavaScript packages, but not exclusively. In the future, we want to implement support for installing packages from other languages.

To do that, we've separated the core (the lockfile) from the linkers (PnP linker, n_m linker). This separation means that the core isn't allowed to directly speak to the .pnp.js: this file is a byproduct of a specific linker, so it's not sure it'll even exist (even in JS-only mode, the user may use the n_m linker which won't generate the .pnp.js file). That's why there's a kind of duplication between the data in the lockfile and those in the PnP file: although similar, they have different consumers.

I just have one question: What should I persist in the virtual state file?The Project instance contains a lot of Maps (Like storedResolutions, storedDescriptors, storedPackages and so on). What are the ones I should save and restore in this file?

A bit of all 🙂 Basically, storedDescriptor is the list of all descriptors in the project. The storedPackage map is the list of all locators. And the storedResolutions map connects the two: from a descriptor hash, you obtain the locator hash it resolves to. So given a range, obtaining the package it got resolved to would be via:

// The descriptor is a "package name"+"range" combination
declare const sourceDescriptor: Descriptor;

const resolvedPackage = project.storedPackages.get(
  project.storedResolutions.get(
    sourceDescriptor.descriptorHash,
  ),
);

As you can see in generateLockfile, it iterates over each entry in storedResolutions to figure out what are all the non-virtual descriptors (there's a comment explaining how to differentiate them) and serialize the locator instance stored in storedPackages. The generateVirtualStateFile function needs to do exactly the same, except that it would iterate on virtual descriptors instead of non-virtual.

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

Ok thank you that was really helpful: I think I managed to generate the correct virtual-state.yml file. It looks like this #990 (comment) let me know if you think this is wrong.

I am now working on how to read this file during yarn run. Am I right to understand that I should replace this: https://github.com/yarnpkg/berry/blob/master/packages/plugin-essentials/sources/commands/run.ts#L70-L73

await project.resolveEverything({
      lockfileOnly: true,
      report: new ThrowReport(),
    });

By something like:

await project.hydrateVirtualPackages();

And the hydrateVirtualPackages method would look like setupResolutions but it would read virtual-state.yml at some points to be faster? Any additional info that would help?

@arcanis
Copy link
Member

arcanis commented Feb 24, 2020

It looks like this #990 (comment) let me know if you think this is wrong.

Yep that's a good first iteration. Later I think we'll want to optimize this format so that your example would end up being:

"debug@virtual:81392a248a4eb1062284166bc5b126ea9dbc003b67ae1ded6b6a4bf4e58d3ad86c80934ef78b5b059394fa3051f64f2a5781fbc893f31c635c098e2c47b576c1#npm:^4.1.0":
  virtualOf: "debug@npm:4.1.1"

"pkg-1-1@virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#workspace:packages/pkg-1-1":
  virtualOf: "pkg-1-1@workspace:packages/pkg-1-1"
  peerResolutions:
    react: "npm:^16.12.0"
    react-dom: "virtual:748f6670d9c6c066fb88aa3380f1d90d0e995ab209cda0c2465335227bd22f406f698d521c4c284ecb1f407affa50215728d2ff86737a8095b0baabe0fe59409#npm:^16.12.0"

Am I right to understand that I should replace this: https://github.com/yarnpkg/berry/blob/master/packages/plugin-essentials/sources/commands/run.ts#L70-L73

Yep!

And the hydrateVirtualPackages method would look like setupResolutions but it would read virtual-state.yml at some points to be faster? Any additional info that would help?

Pretty much. The key difference between setupResolutions and hydrateVirtualPackages is that where setupResolutions completely resets the project stores, hydrateVirtualPackages will expect them to be in a consistent state and will just add the new virtual packages (and only them).

The most complex thing here is that you can't just copy-paste setupResolutions if we want to have a file format similar to what I shared: you'll need to first retrieve the originalPackage instance, and extend it with the peer dependency overrides. It would look like this:

for (const [virtualEntryName, virtualEntry] of Object.values(virtualStateFileData)) {
  const virtualLocator = structUtils.parseLocator(virtualEntryName);
  const virtualDescriptor = structUtils.convertLocatorToDescriptor(virtualLocator);

  const originalLocator = structUtils.parseLocator(virtualEntry.virtualOf);
  const originalPackage = this.originalPackages.get(originalLocator.locatorHash);

  const virtualPackage = structUtils.renamePackage(originalPackage, virtualLocator);
  for (const [name, resolution] of Object.entries(virtualEntry.peerResolutions))
    virtualPackage.dependencies.set(name, structUtils.parseDescriptor(resolution).descriptorHash);

  this.storedPackages.set(virtualPackage.locatorHash, virtualPackage);
  this.storedDescriptors.set(virtualDescriptor.descriptorHash, descriptor);

  this.storedResolutions.set(virtualDescriptor.descriptorHash, virtualPackage.locatorHash);
}

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

Ok, updated #990 , now generating the correct file (I think, not quite sure, to be checked, but it looks like your example, at least superficially.)

Now trying the read part.

@crubier
Copy link
Contributor Author

crubier commented Feb 24, 2020

I made progress, but I think I am blocked now. Running hydrateVirtualPackages as described above does not return the same result as running resolveEverything, we are still clearly missing a large part of the packages.

Here is an example run of yarn run on the yarn repo itself.

I print the number of entries in project.storedPackages after hydrateVirtualPackages vs after resolveEverything:

hydrateVirtualPackages: 64.669ms
538
resolveEverything: 2333.212ms
3079
@Command.Path(`run`)
  async execute() {
    const configuration = await Configuration.find(this.context.cwd, this.context.plugins);
    const {project, workspace, locator} = await Project.find(configuration, this.context.cwd);


    console.time("hydrateVirtualPackages");
    await project.hydrateVirtualPackages();
    console.timeEnd("hydrateVirtualPackages");
    console.log(project.storedPackages.size);

    console.time("resolveEverything");
    await project.resolveEverything({
      lockfileOnly: true,
      report: new ThrowReport(),
    });
    console.timeEnd("resolveEverything");
    console.log(project.storedPackages.size);
   ....

I guess I should also run part of what is done in resolveEverything inside hydrateVirtualPackages at this point?

@crubier
Copy link
Contributor Author

crubier commented Feb 25, 2020

Getting there... slowly. I added a first phase in hydrateVirtualPackages to load the originalPackages :

async hydrateVirtualPackages() {
    const virtualStatePath = ppath.join(this.cwd, this.configuration.get(`virtualStateFilename`));
    const content = await xfs.readFilePromise(virtualStatePath, `utf8`);
    const virtualStateFileData: {[key:string]:{virtualOf:string,peerResolutions:{[key:string]:string}}} = parseSyml(content);

    for (const [locatorHash, pkg] of this.originalPackages.entries()) {
      const desc = structUtils.convertLocatorToDescriptor(pkg);
      this.storedPackages.set(pkg.locatorHash, pkg);
      this.storedDescriptors.set(desc.descriptorHash, desc);
      this.storedResolutions.set(desc.descriptorHash, pkg.locatorHash);
    }


    for (const [virtualEntryName, virtualEntry] of Object.entries(virtualStateFileData)) {
      if (virtualEntryName === `__metadata`)
        continue;
      const virtualLocator = structUtils.parseLocator(virtualEntryName);
      const virtualDescriptor = structUtils.convertLocatorToDescriptor(virtualLocator);

      const originalLocator = structUtils.parseLocator(virtualEntry.virtualOf);
      const originalPackage = this.originalPackages.get(originalLocator.locatorHash);

      if (!originalPackage)
        throw new Error("Wowowowow could not find original package");

      const virtualPackage = structUtils.renamePackage(originalPackage, virtualLocator);
      for (const [name, resolution] of Object.entries(virtualEntry.peerResolutions))
        virtualPackage.dependencies.set(name as IdentHash, structUtils.parseDescriptor(resolution));

      this.storedPackages.set(virtualPackage.locatorHash, virtualPackage);
      this.storedDescriptors.set(virtualDescriptor.descriptorHash, virtualDescriptor);

      this.storedResolutions.set(virtualDescriptor.descriptorHash, virtualPackage.locatorHash);
    }
  }

I now have roughly the right number of packages after hydrateVirtualPackages: ( 3087 ), but I still get an error later on when runnnig yarn run xx on the yarn repo itself:

yarn run echo
setupResolutions: 638.869ms
setupWorkspaces: 87.389ms
hydrateVirtualPackages: 202.205ms
3087
Internal Error: Assertion failed: The resolution (@arcanis/sherlock@^1.0.38) should have been registered
    at getPackageAccessibleBinaries (/Users/vincent/Code/berry/packages/yarnpkg-core/sources/scriptUtils.ts:255:13)
    at getPackageAccessibleBinaries (/Users/vincent/Code/berry/packages/yarnpkg-core/sources/scriptUtils.ts:195:54)

@crubier
Copy link
Contributor Author

crubier commented Feb 25, 2020

@arcanis it looks like the result I get with hydrateVirtualPackages is quite different from the result I'd get with resolveEverything. I am not quite sure of how to solve this, since the resolveEverything seems to do some complex work...

My question would be: Why not run resolveEverything during install and save this result directly instead of only the virtual packages state? That would be closer from my original intent.

@arcanis
Copy link
Member

arcanis commented Feb 26, 2020

I've opened #997 which aims to fix the virtual dependencies atrocious speed. Using your repro repository with parameters of 10-50-50-50-10, I got 1.18s of resolution time 🌟

Can you confirm you get the same results on your private repo?

@crubier
Copy link
Contributor Author

crubier commented Feb 26, 2020

@arcanis Yes indeed I confirm that it is really effective!

On the repro repository I get around 2s resolution time! Much better!

On our own internal repo, I get around 5s resolution time instead of 200s, so this is night and day!

But this 5s resolution time on our repo is still around 5x slower than yarn v1 which was less than 1s.

Hence I think we still need #998 (if it has a chance to work) as explained there #990 (comment)

Combining these two MRs will make yarn v2 actually super useful for people with large monorepos

@arcanis
Copy link
Member

arcanis commented Feb 27, 2020

Closing with #997 and #998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants