-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Group together packages by repo when verifying tarballs #10112
Conversation
1d55ca7
to
c70bac0
Compare
`Sec.catchChecked` (\(e :: Sec.InvalidPackageException) -> warnAndFail (show e)) | ||
`Sec.catchChecked` (\(e :: Sec.VerificationError) -> warnAndFail (show e)) | ||
_ -> pure True | ||
verifyFetchedTarballs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is impossible to read because of the formatter. My apologies.
-- Group up the unvalidated packages by repo so we only read the remote | ||
-- index once per repo (see #10110). The packages are ungrouped here and then regrouped | ||
-- below, it would be better in future to refactor this whole code path so that we don't | ||
-- repeatedly group and ungroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much better to refactor this function to work on a "per-repo" basis but that seemed quite fiddly so I didn't tackle that yet.
Excellent! I've built this patch locally to try out the performance improvements, and they are just amazing. When I started investigating the slowness in Cabal startup, we were spending close to 1.5 seconds in total on With this patch, and with haskell/tar#95, |
verifyFetchedTarball has the effect of deserialising the index tarball (see call to Sec.withIndex). verifyFetchedTarball is called individually for each package in the build plan (see ProjectPlanning.hs). Not once per repo. The hackage tarball is now 880mb so it takes a non significant amount of time to deserialise this (much better after haskell/tar#95). This code path is important as it can add 1s with these 38 calls on to the initial load of a project and scales linearly with the size of your build tree. Reproducer: Simple project with "lens" dependency deserialises the index tarball 38 times. Solution: Refactor verifyFetchedTarball to run once per repo rather than once per package. In future it would be much better to refactor this function so that the items are not immediately grouped and ungrouped but I didn't want to take that on immediately. Fixes haskell#10110
Is this getting backported? |
If it's easy to do, I would. It's a pretty good improvement. |
@mergify backport 3.12 |
✅ Backports have been created
|
verifyFetchedTarball has the effect of deserialising the index tarball (see call to Sec.withIndex).
verifyFetchedTarball is called individually for each package in the build plan (see ProjectPlanning.hs). Not once per repo.
The hackage tarball is now 880mb so it takes a non significant amount of time to deserialise this (much better after haskell/tar#95).
This code path is important as it can add 1s with these 38 calls on to the initial load of a project and scales linearly with the size of your build tree.
Reproducer: Simple project with "lens" dependency deserialises the index tarball 38 times.
Solution: Refactor verifyFetchedTarball to run once per repo rather than once per package.
In future it would be much better to refactor this function so that the items are not immediately grouped and ungrouped but I didn't want to take that on immediately.
Fixes #10110
Please read Github PR Conventions and then fill in one of these two templates.
Template Α: This PR modifies behaviour or interface
Include the following checklist in your PR:
Template B: This PR does not modify behaviour or interface
E.g. the PR only touches documentation or tests, does refactorings, etc.
Include the following checklist in your PR: