Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement shake (continued) #2060

Merged
merged 36 commits into from
Sep 25, 2021
Merged

Conversation

pepeiborra
Copy link
Collaborator

@pepeiborra pepeiborra commented Aug 1, 2021

Rebase of #1759 including bugfixes

  • pass all the tests
  • fix performance regressions (or mitigate via reverse deps)
  • implement reverse deps
  • test scalability
  • implement applicative parallelism (left for future work)

Summary of changes

  • hls-graph is now a full implementation of a reactive dependency graph (aka build system) and no longer wraps Shake.
  • "reactive" means that his-graph tracks reverse dependencies in order to minimize rebuilds when a list of external changes (e.g. which file has been edited) is provided. This in turn means that ghcide now scales much better - rebuilds are no longer linear in the number of transitive module imports (modulo the TH NeedsCompilation rule)
  • A new flag --conservative-change-tracking is added to ghcide to disable reactive change tracking. The flag is intended for debugging only and not for users. A new benchmark example "cabal-conservative" uses it to measure the performance difference.
  • Performance in the Cabal example is equal or better (up to 2X) than before
  • Build profiling is supported and extends the Shake profiling to include information about dirty nodes and visited nodes.
  • Fake Binary instances no longer needed and removed from the codebase

What's missing

  • Applicative parallelism - left for future work, hoping someone smarter than me can do it. Doesn't seem to have a major impact, probably will speed up startup times a bit?

@pepeiborra
Copy link
Collaborator Author

This seems working now. Performance is a mixed bag, I will look at mining the data later today or tomorrow.

Before merging I need to implement reverse dependency tracking and measure scalability.

@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Aug 3, 2021

Performance

Without reverse dependency tracking

Total Time, allocations and max residency are both in the ballpark of Shake, with ups and downs. Using the Cabal example where HEAD is 3dfc695, we have:

  • edit total time Is 16.7% higher
  • hover total time is 8% lower
  • hover after edit total time is 8% higher
  • getDefinition total time is 33% lower
  • code actions total time is 115% higher (but allocations are slightly lower, weird)
  • code actions after edit total time is 18% lower but max residency is 60% higher
  • document symbols after edit total time is 500% higher (weird, would be worth repeating), allocations are 160% higher, and max residency is 77% higher
  • etc.
    Full results obtained from the Github CI run (not the most isolated environment):
version    name                             success   samples   startup              setup          userTime             delayedTime             totalTime            maxResidency   allocatedBytes
5
upstream   edit                             True      50        2.272495847          0.0            0.312656619          5.235386871999999       5.556636593          176MB          7443MB
6
HEAD       edit                             True      50        2.939898887          0.0            1.525507581000001    4.951573532000001       6.481253863          206MB          8662MB
7
upstream   hover                            True      50        2.2531108630000003   0.0            5.4577234090000015   3.1075063e-2            5.492708015000001    176MB          7121MB
8
HEAD       hover                            True      50        2.833231339          0.0            5.007434441000001    4.3085500000000006e-2   5.060099354          177MB          6940MB
9
upstream   hover after edit                 True      50        2.729492039          0.0            5.381235953999998    22.509599982            27.895088985         204MB          37031MB
10
HEAD       hover after edit                 True      50        2.567809608          0.0            5.418567334000003    24.762967906999993      30.185962892000003   332MB          40195MB
11
upstream   getDefinition                    True      50        2.307331066          0.0            5.701828237000002    1.2548546000000004e-2   5.7181945270000005   175MB          7120MB
12
HEAD       getDefinition                    True      50        2.995683892          0.0            3.585127652000001    0.18568288699999988     3.7749409870000004   173MB          6968MB
13
upstream   getDefinition after edit         True      50        2.14589879           0.0            6.631301436000003    20.42667203700001       27.062120324000002   205MB          36189MB
14
HEAD       getDefinition after edit         True      50        2.7035592150000003   0.0            5.951320493000001    22.8122467              28.768026244         338MB          38282MB
15
upstream   completions                      True      50        1.761511226          0.0            8.047872840000002    1.7012296e-2            8.073320538          203MB          12450MB
16
HEAD       completions                      True      50        2.615275548          0.0            7.111024207000004    1.6015886e-2            7.1318453470000005   198MB          12273MB
17
upstream   completions after edit           True      50        2.258389204          0.0            10.073498866000001   18.923962975000002      29.00345751          236MB          42098MB
18
HEAD       completions after edit           True      50        3.0192549090000003   0.0            10.222434903         20.109549577000003      30.340781481         366MB          44555MB
19
upstream   code actions                     True      50        5.333430205          1.4487374e-2   0.648637333          1.6679171369999999      2.320392221          168MB          7450MB
20
HEAD       code actions                     True      50        2.206230134          2.5243695e-2   1.0010000770000003   3.9947446900000005      5.015267196          163MB          6962MB
21
upstream   code actions after edit          True      50        1.916653975          0.0            20.624939044         1.4387978480000005      22.068674153         191MB          29346MB
22
HEAD       code actions after edit          True      50        2.801418785          0.0            14.555310371000004   3.566505903             18.127092237         303MB          25500MB
23
upstream   code actions after cradle edit   True      50        2.1320045110000003   0.0            13.655822500000003   5.446925311000002       19.107695167000003   209MB          26052MB
24
HEAD       code actions after cradle edit   True      50        2.198851487          0.0            14.194154865000002   5.844885103000003       20.044018823000002   335MB          27824MB
25
upstream   documentSymbols after edit       True      50        1.354437694          0.0            2.6904143140000003   3.9411810650000003      6.638646066000001    179MB          10999MB
26
HEAD       documentSymbols after edit       True      50        2.18587876           0.0            15.608887073999998   4.314848547999999       19.938731197000003   318MB          26012MB
27
upstream   hole fit suggestions             True      50        2.1463324790000002   0.0            47.15185008600001    1.8690116999999996e-2   47.177355986         218MB          60676MB
28
HEAD       hole fit suggestions             True      50        2.551580878          0.0            46.36738798099999    1.8521415e-2            46.392119067         327MB          60259MB

With reverse dependency tracking

Using the Cabal example where HEAD is ac99d9c, we see that total time is lower for most examples but max residency is considerably higher, presumably due to the additional state to track reverse dependencies (or a bug):

  • edit total time is 20% lower
  • hover after edit total time is 29% lower
  • get definition after edit total time is 33% lower but max residency is 180% higher
  • completions after edit total time is 30% lower but max residency is 160% higher
  • code actions after edit total time is 44% lower
  • code actions after cradle edit total time is 59% lower
  • document symbols after edit total time is 36.8% higher, allocations are 15% higher, and max residency is 222% higher
  • etc.
    Full results obtained from the Github CI run (not the most isolated environment):
version    name                             success   samples   startup              setup                   userTime             delayedTime             totalTime            maxResidency   allocatedBytes
5
upstream   edit                             True      50        2.0464692120000003   0.0                     3.771236898          6.215939686999999       9.993117938000001    181MB          11779MB
6
HEAD       edit                             True      50        2.5070064260000002   0.0                     3.010397536000001    4.968224615999996       7.983422218          252MB          9115MB
7
upstream   hover                            True      50        1.509634178          0.0                     4.79936772           0.10946838299999999     4.913372751000001    178MB          7116MB
8
HEAD       hover                            True      50        2.7730088910000004   0.0                     5.180883135999997    3.504994100000001e-2    5.230458685          178MB          6997MB
9
upstream   hover after edit                 True      50        2.487633504          0.0                     6.890062458999999    24.930129366            31.825360113000002   207MB          36933MB
10
HEAD       hover after edit                 True      50        2.418334638          0.0                     6.181287432          16.530389695000004      22.717214748         582MB          24948MB
11
upstream   getDefinition                    True      50        1.9895303180000001   0.0                     6.801371679000001    1.5523176000000007e-2   6.821346994000001    177MB          7089MB
12
HEAD       getDefinition                    True      50        3.1426977000000003   0.0                     5.140733387999998    2.8642663999999988e-2   5.174019203          179MB          6983MB
13
upstream   getDefinition after edit         True      50        2.2615483110000003   0.0                     7.9594969220000005   23.39991231600001       31.364496515000003   207MB          36138MB
14
HEAD       getDefinition after edit         True      50        2.871311716          0.0                     6.121026748000002    14.691798131            20.817694644000003   591MB          23630MB
15
upstream   completions                      True      50        1.9937127120000002   0.0                     8.869379315000002    2.0308336e-2            8.895150961          205MB          12447MB
16
HEAD       completions                      True      50        2.5907287070000002   0.0                     7.1039445489999995   2.3898504000000008e-2   7.133802204          205MB          12265MB
17
upstream   completions after edit           True      50        2.383978646          0.0                     10.788709890999996   21.624756953            32.422100895         236MB          42126MB
18
HEAD       completions after edit           True      50        2.780890566          0.0                     12.749612272000006   10.316970059999997      23.079330996000003   614MB          29711MB
19
upstream   code actions                     True      50        2.051170527          1.4181572000000002e-2   0.9363013359999999   5.652116202             6.593045581          173MB          7130MB
20
HEAD       code actions                     True      50        2.985416648          1.1376260000000001e-2   1.102338611          3.7641862959999997      4.886573626000001    170MB          6984MB
21
upstream   code actions after edit          True      50        2.073519519          0.0                     20.554091040000003   5.455980662             26.016013051         195MB          30579MB
22
HEAD       code actions after edit          True      50        3.0034777970000004   0.0                     10.574070786         4.022146447000003       14.603397271         595MB          16776MB
23
upstream   code actions after cradle edit   True      50        2.161535727          0.0                     21.136283263         7.025362857000003       28.167373854         224MB          32129MB
24
HEAD       code actions after cradle edit   True      50        3.1631840380000003   0.0                     8.233220917000002    3.441370713             11.680453183000001   524MB          13065MB
25
upstream   documentSymbols after edit       True      50        2.708412006          0.0                     3.2785558110000004   5.482434728             8.77287724           177MB          11079MB
26
HEAD       documentSymbols after edit       True      50        2.825938144          0.0                     7.695673330000002    4.3817235619999995      12.094445087         571MB          12748MB
27
upstream   hole fit suggestions             True      50        2.468067436          0.0                     53.709667893         2.5438521000000002e-2   53.74341694          216MB          60727MB
28
HEAD       hole fit suggestions             True      50        3.059986549          0.0                     43.864609649         2.6667305000000002e-2   43.898593311000006   603MB          45390MB

I will try to rerun the benchmarks in a more isolated environment again.

The next step will be to measure scalability

@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Aug 3, 2021

Repeated the benchmarks in a more isolated environment (Xeon gold 6138 w/ 256GB RAM, 80 virtual cores) and noticed some failed runs due to "Index out of range" crashes in this branch. I guess there must be a parallelism bug somewhere that doesn't arise in the 1-core Github CI agents. Will need to investigate more:

ghcide: Ix{Int}.index: Index (4) out of range ((0,0))

@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Aug 4, 2021

It reproduces reliably with ghcide -j64 but not with lower parallelism settings. The exception is being thrown by Data.Ix.index, which is not used anywhere in the reimplementation of Shake nor in its dependencies, so its either an existing bug that has been uncovered by the new dynamic semantics, or a Shake invariant violation in the new implementation.

Tracking it down is extremely difficult because +RTS -xc is not giving me any useful info, Data.Ix.index uses errorWithoutStackTrace, and I don't see how to use the ghci debugger to catch this, so I'm afraid the only resort is to edit Data.Ix.index and rebuild base (i.e. GHC).

@pepeiborra
Copy link
Collaborator Author

Tracked it down to the GHC thread manager by editing base to use error instead of errorWithoutStackTrace and add HasCallStack constraints until getting a fully defined call stack:

image

-- | Retrieve the system event manager for the capability on which the
-- calling thread is running.
--
-- This function always returns 'Just' the current thread's event manager
-- when using the threaded RTS and 'Nothing' otherwise.
getSystemEventManager :: IO (Maybe EventManager)
getSystemEventManager = do
  t <- myThreadId
  (cap, _) <- threadCapability t
  eventManagerArray <- readIORef eventManager
  mmgr <- readIOArray eventManagerArray cap
  return $ fmap snd mmgr

https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/Event/Thread.hs#L182

It looks like eventManagerArray is still empty when the runtime tries to index it. This is likely a GHC bug, unless we are doing something wrong like dynamically changing the number of capabilities multiple times.

@pepeiborra
Copy link
Collaborator Author

The problem was that we were calling withNumCapabilities on every build, resetting the number of caps to 1 in between builds.

@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Aug 8, 2021

I believe the performance regressions are due to the parallel evaluation of all the dependencies of a node. Shake is lazy staged eager (firstJustUnordered firstJustM) whereas hls-graph is full eager (mapConcurrently). This shows in benchmarks like "documentSymbols after edit" where we:

  1. Kick chunky build to typecheck all the FOIs after an edit.
  2. Enqueue cheap build action request to request the parse trees of FOIs

1 gets there first and spawns thousands of threads. The chunky build does little actual work: only a few nodes in the build graph need recomputing, all the other ones are just doing a lookup. When 2 arrives it spawns another thread which needs to join the queue at the back, and ends up waiting for a long time. With reverse dependency tracking, the chunky build in 1 becomes much less chunky and the regression should disappear, except in the worst case (startup) when all the project files are FOIs.

We could solve this by:
a) Minimising the chunky build via reverse deps (needs validation)
b) Avoid spawning threads to just do a lookup
c) writing our own scheduler, or reusing monad-par
d) using threadpools, or reusing async-pool
e) adding priorities to the native scheduler.

pepeiborra added a commit that referenced this pull request Aug 17, 2021
This is useful in the context of #2060 to compare performance with and
without reactive change tracking
pepeiborra added a commit to pepeiborra/ide that referenced this pull request Aug 18, 2021
This is useful in the context of haskell#2060 to compare performance with and
without reactive change tracking
@pepeiborra pepeiborra force-pushed the reimplement-shake branch 2 times, most recently from f8985d4 to 6d9b6cd Compare August 18, 2021 08:25
pepeiborra added a commit that referenced this pull request Aug 19, 2021
* [ghcide-bench] Support extra args in examples

This is useful in the context of #2060 to compare performance with and
without reactive change tracking

* Fix bench.yml CI script
@pepeiborra pepeiborra force-pushed the reimplement-shake branch 7 times, most recently from 665cafb to 42410bf Compare August 24, 2021 07:29
@pepeiborra pepeiborra marked this pull request as ready for review August 24, 2021 07:46
@pepeiborra
Copy link
Collaborator Author

Performance regressions have been mitigated via both a) reverse deps and b) avoiding new threads for trivial lookups. Will follow up with benchmark results soon.

@pepeiborra pepeiborra force-pushed the reimplement-shake branch 2 times, most recently from b095f40 to d652f11 Compare August 27, 2021 16:35
The main benefit of reverse dependency tracking is that we avoid unnecessary node lookups. However, these lookups were not shown in hls-graph profiles, hiding their real cost.

Concretely, if the number of lookups per build is proportional to the number of transitive dependencies of a node, which in turn is proportional to the number of transitive imports of a module, then we have an O(edges) complexity instead of an O(nodes) complexity, which is really bad for large projects.

This Diff extends the recorded data and the profiling UI to keep track
of visited nodes and to show them in a new column "Visited" of the "Rules" tab. The cost of doing this is storing an additional Int per node at runtime.

While I was editing the profiling UI, I took the chance to remove the command tabs, update the README and add some missing files
@mergify mergify bot merged commit 682386d into haskell:master Sep 25, 2021
@pepeiborra
Copy link
Collaborator Author

Finally!
😱😱😱😱😱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants