-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fast-deps] How should we isolate the networking code for late download in parallel? #8697
Comments
Some of these may be better split off into their own topics, but here's what I have: Handling parallelizationThere are several things along the path from the place in the code where we know we need to download something to actually getting it:
It would be good to know that those are all safe to use concurrently however we're planning to do it (whether threading, multiprocessing, or async). It would also be good to know how we plan to mitigate:
Using existing files where possibleDepending on the command, the file may already exist:
In all three cases, I think the wheel cache piece may be handled earlier during resolving, so we wouldn't inadvertently use a lazy wheel for it. In all three cases, it may be the case that the file we want is already in the HTTP cache. I don't know if range requests bypass the HTTP cache or it does what we want (which would be to return the requested bytes from the file on disk). Maybe it is worth it to see if the request is cached first, and just skip the lazy wheel? In Desired outcome of any download processGiven that fast-deps only applies to wheels, the following needs to be true:
If we also want to match what we currently do, then:
|
Thank you for the write up, I'll try to address each of these as time goes by. I just want to let you know that I have seen your comment. |
One thing that could help in any situation is an integration test that reproduces the kind of situation that we're trying to improve. Like installing some big requirements file and being able to tell how much of that was due to download, getting metadata, or some other part of processing. Once we have something basic we can decide what other effects to try and account for, or what would look the most like a typical user's situation. For the network parallelization piece, one approach that may sidestep a lot of the concerns mentioned above would be to:
I don't know how this will play with the HTTP cache, but in this way the "parallel" part of the code will not impact code out of our control (keyring plugins) and there are no possible edge cases for authentication prompting. Regarding performance it may be reasonable to do since a quick look shows |
Originally posted by @McSinyx in #8638 (comment)
Regarding my GSoC project on download parallelization, now we have:
fast-deps
(thank you @chrahunt for the clean up at Move fast-dep handling to RequirementPreparer #8685)In the current code base, when
fast-deps
is used, RequirementPreparer.prepare_linked_requirement may skip the actual preparation (including some checks, downloading the whole wheel and moving it around) and save it for right before the new resolver returning the requirement set. Essentially the download preparation is what we want to execute in parallel, however to obtain progress data for display, it is not a good idea tomap_multithreading
the method directly.With that in mind, we may want to invoke the download before the final preparation calls (the following snippet).
pip/src/pip/_internal/resolution/resolvelib/resolver.py
Lines 162 to 165 in ee4371c
@pradyunsg and I came to a conclusion that it might be a good idea to have the path to download files stored in the InstallRequirement to be able to cleanly skip the call to unpack_url.
This comes out as the output of my discussion with @pradyunsg after a few unsuccessful attempts of mine to solve the issue. He suggested that a issue should be filed to gather opinions from other maintainers (as well interested contributors) to construct the most complete view on the matter. Please feel free to share what's in your mind!
The text was updated successfully, but these errors were encountered: