Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease contention in Progress reporting #2357

Merged
merged 5 commits into from
Dec 13, 2021
Merged

Decrease contention in Progress reporting #2357

merged 5 commits into from
Dec 13, 2021

Conversation

pepeiborra
Copy link
Collaborator

@pepeiborra pepeiborra commented Nov 16, 2021

Out of curiosity I put together a lock-less version of hls-graph using the sim-containers package as suggested by @wz1000 and others in IRC a few months ago.

So far I haven't seen any perf. improvement, in fact the amount of work done (allocations) is much higher. Perhaps with more tuning it could be competitive? Leaving it here in case anyone wants to have a play

EDIT: The original PR was broken in chunks, and now that there is only one major change remaining, it's worth reviewing and merging this PR to preserve the discussion instead of extracting it into yet another PR.

@pepeiborra pepeiborra force-pushed the lock-free branch 2 times, most recently from f053f11 to 6c94505 Compare November 16, 2021 00:17
@wz1000
Copy link
Collaborator

wz1000 commented Nov 16, 2021

Nice! I suppose an increase in allocations could be acceptable if we are gaining in throughput due to less contention across CPUs. Did you try to compare eventlogs in tracy or threadscope?

@pepeiborra
Copy link
Collaborator Author

I am actually seeing some interesting perf gains in Sigma.
Startup 1m20s -> 1m07s.

  • shake: 1m09s
  • hls-graph: 1m20s
  • lockless his-graph: 1m07s

@pepeiborra
Copy link
Collaborator Author

Looking at this in Threadscope with 40 capabilities is not very enlightening. The Activity tab is a bit less sparse in the lock-less version which confirms the improvement.

@wz1000
Copy link
Collaborator

wz1000 commented Nov 18, 2021

We also use a lot over other MVars as part of the IDE state, including the shake store itself, which seems like it would limit the possible upside of this, since all the threads will need to synchronize on those anyway.

@pepeiborra
Copy link
Collaborator Author

We also use a lot over other MVars as part of the IDE state, including the shake store itself, which seems like it would limit the possible upside of this, since all the threads will need to synchronize on those anyway.

True. I'll try with a lockless values collection when I find some time

@pepeiborra pepeiborra force-pushed the lock-free branch 4 times, most recently from 3ef1028 to d6ca1da Compare November 25, 2021 23:24
@pepeiborra pepeiborra changed the title Lock-less hls-graph prototype Lock-less hls-graph & ghcide Nov 26, 2021
@pepeiborra pepeiborra force-pushed the lock-free branch 7 times, most recently from b3a9b34 to 33eca72 Compare November 28, 2021 20:18
@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Nov 28, 2021

I have now removed all or most of the contentious MVars in the ghcide codebase and used stm-stats to optimise the atomically sections. Allocations are no longer sky high and performance is very competitive.

For Sigma using the 'edit' experiment:

name       | success | samples | startup | setup | userTime | delayedTime | totalTime
---------- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
hls-graph  | True    | 10      | 1m13s   | 0.00s | 7.62s    | 0.00s       | 7.62s    
lockless   | True    | 10      | 51.54s  | 0.00s | 6.78s    | 0.00s       | 6.79s    

Using the cabal example from the benchmark suite in a 40core machine:

version    name                             success   samples   startup              setup          userTime              delayedTime             totalTime             buildRulesBuilt   buildRulesChanged   buildRulesVisited   buildRulesTotal   buildEdges   maxResidency   allocatedBytes
upstream   edit                             True      50        2.5236723640000003   0.0            17.309364283999997    2.2253431000000004e-2   17.340336119          16                12                  1399                5812              116648       248MB          29321MB
HEAD       edit                             True      50        1.70690691           0.0            12.809913039000003    2.2510854e-2            12.841288722000002    14                11                  1396                5812              116655       314MB          40589MB
upstream   hover                            True      50        2.677677766          0.0            2.0205504599999995    3.6659328000000005e-2   2.063656252           4239              4238                4239                5814              116654       178MB          6935MB
HEAD       hover                            True      50        2.6147549270000003   0.0            1.3290905930000012    4.2088196e-2            1.3774166090000002    3628              3627                3628                5814              116661       245MB          7929MB
upstream   hover after edit                 True      50        2.745466355          0.0            2.9332188379999997    9.182722730000002       12.121818079          20                16                  1403                5813              116654       208MB          26303MB
HEAD       hover after edit                 True      50        2.5198113330000003   0.0            1.405385447           9.027772011000001       10.438775243          20                16                  1403                5813              116661       305MB          34985MB
upstream   getDefinition                    True      50        2.4515219630000002   0.0            2.5775262789999998    1.8116561e-2            2.601291471           4627              4626                4627                5814              116650       178MB          6938MB
HEAD       getDefinition                    True      50        2.8140236950000004   0.0            1.6005932100000002    1.8597985000000015e-2   1.625352004           3820              3819                3820                5814              116657       242MB          7620MB
upstream   getDefinition after edit         True      50        2.668066242          0.0            3.276597416999999     8.186274018999999       11.469090961000001    18                14                  1403                5813              116650       217MB          23842MB
HEAD       getDefinition after edit         True      50        2.4156138190000003   0.0            2.040306114           7.65233714              9.699113597           18                14                  1403                5813              116657       304MB          32698MB
upstream   completions                      True      50        2.717525123          0.0            7.843920201000001     5.129462200000001e-2    7.902699020000001     4295              4294                4295                5818              116656       216MB          12711MB
HEAD       completions                      True      50        2.7952292190000003   0.0            6.657334089000001     6.3071604e-2            6.730849490000001     3921              3920                3921                5818              116663       283MB          13476MB
upstream   completions after edit           True      50        2.532163161          0.0            8.228262227999998     4.833522975             13.077500114000001    22                18                  1407                5818              116656       276MB          30539MB
HEAD       completions after edit           True      50        2.318876577          0.0            7.877940261999998     1.7576836900000004      9.646766283           22                18                  1407                5818              116663       349MB          38693MB
upstream   code actions                     True      50        2.3714188320000003   1.354400717    0.44511653299999987   1.772866e-2             0.46792517200000006   4071              4069                4071                5564              113425       175MB          7018MB
HEAD       code actions                     True      50        2.842151372          9.1338843e-2   2.8614480760000007    0.8135180320000005      3.6814263780000003    3029              3027                3030                5564              113425       266MB          8158MB
upstream   code actions after edit          True      50        2.959363642          0.0            9.894581280999999     1.8914173000000006e-2   9.920506101           104               102                 944                 5564              113425       189MB          16803MB
HEAD       code actions after edit          True      50        2.556364557          0.0            8.487716788           2.0234254e-2            8.515298172000001     100               99                  939                 5564              113425       274MB          26549MB
upstream   code actions after cradle edit   True      50        2.456606877          2.128288427    131.813322058         2.1729532999999995e-2   131.841659595         2225              1456                2722                5564              113425       491MB          290896MB
HEAD       code actions after cradle edit   True      50        2.6470058240000003   1.270049499    83.514126984          2.504553e-2             83.548057181          2225              1456                2722                5564              113425       533MB          332711MB
upstream   documentSymbols after edit       True      50        2.4986716930000004   0.0            5.705937710000002     2.7275558709999994      8.440632186           15                11                  1398                5810              116642       214MB          16961MB
HEAD       documentSymbols after edit       True      50        2.5817652460000002   0.0            5.854933317000001     1.1111080600000003      6.974245725           15                11                  1398                5809              116652       300MB          25761MB
upstream   hole fit suggestions             True      50        2.677635157          0.0            29.319077228          3.0695894000000012e-2   29.358762491          18                14                  1401                5812              116648       240MB          46963MB
HEAD       hole fit suggestions             True      50        2.9301082970000003   0.0            28.275447114000002    2.878592700000001e-2    28.313195967000002    18                14                  1401                5812              116655       336MB          56451MB

In the same machine and example, performance regresses for -j1 and -j2 but picks up for -j4. This is expected due to the O(logN) lookup in the HAMT used by stm-containers compared to the O(1) lookups afforded by the array used previously, as well as the extra allocations, and I think it is small and scoped enough to be acceptable.

example    version    name   success   samples   startup              setup   userTime             delayedTime             totalTime            buildRulesBuilt   buildRulesChanged   buildRulesVisited   buildRulesTotal   buildEdges   maxResidency   allocatedBytes
cabal-j1   upstream   edit   True      50        3.038941884          0.0     20.459804242999994   1.789122e-2             20.483839169000003   18                14                  1401                5812              116655       231MB          27609MB
cabal-j1   HEAD       edit   True      50        2.693946489          0.0     32.304275194999995   3.9534974010000012      36.264355492         13                10                  1395                5812              116655       317MB          39791MB
cabal-j2   upstream   edit   True      50        2.2717036480000004   0.0     14.545436931000003   3.504462105             18.057819968         18                14                  1401                5812              116655       231MB          28272MB
cabal-j2   HEAD       edit   True      50        1.27566215           0.0     18.730493020999997   2.100601463000003       20.838892971         13                11                  1396                5812              116655       313MB          39763MB
cabal-j4   upstream   edit   True      50        1.354578941          0.0     13.971250236999998   1.8676294000000003e-2   13.996763301000001   17                13                  1400                5812              116655       238MB          29232MB
cabal-j4   HEAD       edit   True      50        1.8347364540000002   0.0     13.273148532         1.7931536790000004      15.073573214000001   14                11                  1396                5812              116655       313MB          40006MB

I'm pretty happy with those results and plan to land it before the next release.
Since the PR has become huge, I will slice it in smaller PRs for easier review.

@pepeiborra pepeiborra mentioned this pull request Nov 28, 2021
@pepeiborra
Copy link
Collaborator Author

I am a bit concerned about the new dependencies after a brief look at the Github repos. For instance, this PR has been open and unactioned for >6 months:

nikita-volkov/stm-hamt#2

@pepeiborra
Copy link
Collaborator Author

@nikita-volkov are you still maintaining the stm-containers ecosystem?

@nikita-volkov
Copy link

@pepeiborra Thanks for the mention. Yeah, it is maintained. I somehow missed all the activity in the related repos. Fixes are applied and released now.

@pepeiborra
Copy link
Collaborator Author

Awesome! Thank you for confirming @nikita-volkov

@pepeiborra pepeiborra force-pushed the lock-free branch 2 times, most recently from baf6714 to 75428e0 Compare December 9, 2021 20:26
@pepeiborra pepeiborra mentioned this pull request Dec 9, 2021
@pepeiborra pepeiborra mentioned this pull request Dec 11, 2021
@pepeiborra pepeiborra force-pushed the lock-free branch 3 times, most recently from bde97a3 to b8d9081 Compare December 12, 2021 09:34
@pepeiborra pepeiborra changed the title Lock-less hls-graph & ghcide Decrease contention in Progress reporting Dec 12, 2021
@pepeiborra pepeiborra requested a review from michaelpj December 12, 2021 09:34
@pepeiborra
Copy link
Collaborator Author

pepeiborra commented Dec 12, 2021

The gist of the changes left in this PR after merging all the previous bits is an improvement to the progress reporting code in order to decrease contention. Some stats:

BEFORE

    STM transaction statistics (2021-12-12 09:30:40.138006 UTC):
    Transaction               Commits    Retries      Ratio
    _anonymous_                 15297        118       0.01
    action queue - pop              2          2       1.00
    actionQueue - done              2          0       0.00
    actionQueue - peek             29          0       0.00
    actionQueue - push              2          0       0.00
    builder                    282354        853       0.00
    compute                     16882         16       0.00
    debouncer                    6842        195       0.03
    define - dirtyKeys          16895          2       0.00
    define - read 1             10710         11       0.00
    define - read 2              6232          5       0.00
    define - write               6225          1       0.00
    diagnostics - hidden         6871          9       0.00
    diagnostics - publish        4073        188       0.05
    diagnostics - read           6886          4       0.00
    diagnostics - update         6871         23       0.00
    incDatabase                 10966          0       0.00
    lastValueIO 4                2200          0       0.00
    lastValueIO 5                2200          0       0.00
    recordProgress              31238      13856       0.44
    updateReverseDeps           64994        358       0.01

AFTER

    STM transaction statistics (2021-12-12 09:24:24.769304 UTC):
    Transaction               Commits    Retries      Ratio
    _anonymous_                 15199        134       0.01
    action queue - pop              2          2       1.00
    actionQueue - done              2          0       0.00
    actionQueue - peek             29          0       0.00
    actionQueue - push              2          0       0.00
    builder                    282244        744       0.00
    compute                     16882         26       0.00
    debouncer                    6847        220       0.03
    define - dirtyKeys          16908          1       0.00
    define - read 1             10710          8       0.00
    define - read 2              6244          2       0.00
    define - write               6236          1       0.00
    diagnostics - hidden         6876         18       0.00
    diagnostics - publish        3978        184       0.05
    diagnostics - read           6886          2       0.00
    diagnostics - update         6876         24       0.00
    incDatabase                 10966          0       0.00
    lastValueIO 4                2200          1       0.00
    lastValueIO 5                2200          0       0.00
    recordProgress              31252        403       0.01
    recordProgress2             31252        207       0.01
    updateReverseDeps           64994        430       0.01

@pepeiborra pepeiborra marked this pull request as ready for review December 12, 2021 09:39
BEFORE
======
```
STM transaction statistics (2021-12-12 09:30:40.138006 UTC):
Transaction               Commits    Retries      Ratio
_anonymous_                 15297        118       0.01
action queue - pop              2          2       1.00
actionQueue - done              2          0       0.00
actionQueue - peek             29          0       0.00
actionQueue - push              2          0       0.00
builder                    282354        853       0.00
compute                     16882         16       0.00
debouncer                    6842        195       0.03
define - dirtyKeys          16895          2       0.00
define - read 1             10710         11       0.00
define - read 2              6232          5       0.00
define - write               6225          1       0.00
diagnostics - hidden         6871          9       0.00
diagnostics - publish        4073        188       0.05
diagnostics - read           6886          4       0.00
diagnostics - update         6871         23       0.00
incDatabase                 10966          0       0.00
lastValueIO 4                2200          0       0.00
lastValueIO 5                2200          0       0.00
recordProgress              31238      13856       0.44
updateReverseDeps           64994        358       0.01
```
AFTER
=====
```
STM transaction statistics (2021-12-12 09:24:24.769304 UTC):
Transaction               Commits    Retries      Ratio
_anonymous_                 15199        134       0.01
action queue - pop              2          2       1.00
actionQueue - done              2          0       0.00
actionQueue - peek             29          0       0.00
actionQueue - push              2          0       0.00
builder                    282244        744       0.00
compute                     16882         26       0.00
debouncer                    6847        220       0.03
define - dirtyKeys          16908          1       0.00
define - read 1             10710          8       0.00
define - read 2              6244          2       0.00
define - write               6236          1       0.00
diagnostics - hidden         6876         18       0.00
diagnostics - publish        3978        184       0.05
diagnostics - read           6886          2       0.00
diagnostics - update         6876         24       0.00
incDatabase                 10966          0       0.00
lastValueIO 4                2200          1       0.00
lastValueIO 5                2200          0       0.00
recordProgress              31252        403       0.01
recordProgress2             31252        207       0.01
updateReverseDeps           64994        430       0.01
```
After
=====
```
STM transaction statistics (2021-12-12 22:11:20.016977 UTC):
Transaction               Commits    Retries      Ratio
_anonymous_                 15227        116       0.01
action queue - pop              2          2       1.00
actionQueue - done              2          0       0.00
actionQueue - peek             29          0       0.00
actionQueue - push              2          0       0.00
builder                    282373        771       0.00
compute                     16882         32       0.00
debouncer                    6864        215       0.03
define - dirtyKeys          16900          0       0.00
define - read 1             10710          3       0.00
define - read 2              6254          3       0.00
define - write               6248          1       0.00
diagnostics - hidden         6893         10       0.00
diagnostics - publish        4006        200       0.05
diagnostics - read           6901          1       0.00
diagnostics - update         6893         22       0.00
incDatabase                 10966          0       0.00
lastValueIO 4                2200          0       0.00
lastValueIO 5                2200          0       0.00
recordProgress              31238        387       0.01
recordProgress2             31238         79       0.00
updateReverseDeps           64994        387       0.01
```
@pepeiborra pepeiborra added the merge me Label to trigger pull request merge label Dec 12, 2021
@mergify mergify bot merged commit 3b581a1 into master Dec 13, 2021
@michaelpj
Copy link
Collaborator

Unrelated to contention, but is it suspicious that "action queue" appears to be used only twice? Maybe not, it just stood out to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants