Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Group together packages by repo when verifying tarballs #10112

Merged
merged 1 commit into from
Jun 16, 2024

Conversation

mpickering
Copy link
Collaborator

verifyFetchedTarball has the effect of deserialising the index tarball (see call to Sec.withIndex).

verifyFetchedTarball is called individually for each package in the build plan (see ProjectPlanning.hs). Not once per repo.

The hackage tarball is now 880mb so it takes a non significant amount of time to deserialise this (much better after haskell/tar#95).

This code path is important as it can add 1s with these 38 calls on to the initial load of a project and scales linearly with the size of your build tree.

Reproducer: Simple project with "lens" dependency deserialises the index tarball 38 times.

Solution: Refactor verifyFetchedTarball to run once per repo rather than once per package.

In future it would be much better to refactor this function so that the items are not immediately grouped and ungrouped but I didn't want to take that on immediately.

Fixes #10110

Please read Github PR Conventions and then fill in one of these two templates.


Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

@mpickering mpickering force-pushed the wip/10110 branch 2 times, most recently from 1d55ca7 to c70bac0 Compare June 14, 2024 09:51
`Sec.catchChecked` (\(e :: Sec.InvalidPackageException) -> warnAndFail (show e))
`Sec.catchChecked` (\(e :: Sec.VerificationError) -> warnAndFail (show e))
_ -> pure True
verifyFetchedTarballs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is impossible to read because of the formatter. My apologies.

-- Group up the unvalidated packages by repo so we only read the remote
-- index once per repo (see #10110). The packages are ungrouped here and then regrouped
-- below, it would be better in future to refactor this whole code path so that we don't
-- repeatedly group and ungroup.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much better to refactor this function to work on a "per-repo" basis but that seemed quite fiddly so I didn't tackle that yet.

@alt-romes
Copy link
Collaborator

alt-romes commented Jun 14, 2024

Excellent! I've built this patch locally to try out the performance improvements, and they are just amazing.

When I started investigating the slowness in Cabal startup, we were spending close to 1.5 seconds in total on deserialise, a function called via verifyFetchedTarballs.

With this patch, and with haskell/tar#95, deserialise takes now 2 milliseconds. What a win.

@alt-romes alt-romes added the merge me Tell Mergify Bot to merge label Jun 14, 2024
@mergify mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Jun 16, 2024
verifyFetchedTarball has the effect of deserialising the index tarball
(see call to Sec.withIndex).

verifyFetchedTarball is called individually for each package in the
build plan (see ProjectPlanning.hs). Not once per repo.

The hackage tarball is now 880mb so it takes a non significant amount of
time to deserialise this (much better after haskell/tar#95).

This code path is important as it can add 1s with these 38 calls on to
the initial load of a project and scales linearly with the size of your
build tree.

Reproducer: Simple project with "lens" dependency deserialises the index tarball 38 times.

Solution: Refactor verifyFetchedTarball to run once per repo rather than once per package.

In future it would be much better to refactor this function so that the
items are not immediately grouped and ungrouped but I didn't want to
take that on immediately.

Fixes haskell#10110
@mergify mergify bot merged commit e1f73a4 into haskell:master Jun 16, 2024
52 checks passed
@geekosaur
Copy link
Collaborator

Is this getting backported?

@alt-romes
Copy link
Collaborator

Is this getting backported?

If it's easy to do, I would. It's a pretty good improvement.

@ulysses4ever
Copy link
Collaborator

@mergify backport 3.12

Copy link
Contributor

mergify bot commented Jun 18, 2024

backport 3.12

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days merge me Tell Mergify Bot to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: verifyFetchedTarball called once for each package from a remote repository
4 participants