-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up pip list --outdated
#7962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to be a bit lazier, e.g.
something like this
def iter_packages_latest_infos(self, packages, options):
with self._build_session(options) as session:
finder = self._build_package_finder(options, session)
def latest_infos(dist):
typ = 'unknown'
all_candidates = finder.find_all_candidates(dist.key)
if not options.pre:
# Remove prereleases
all_candidates = [candidate for candidate in all_candidates
if not candidate.version.is_prerelease]
evaluator = finder.make_candidate_evaluator(
project_name=dist.project_name,
)
best_candidate = evaluator.sort_best_candidate(all_candidates)
if best_candidate is None:
return None
remote_version = best_candidate.version
if best_candidate.link.is_wheel:
typ = 'wheel'
else:
typ = 'sdist'
# This is dirty but makes the rest of the code much cleaner
dist.latest_version = remote_version
dist.latest_filetype = typ
return dist
pool = Pool(DEFAULT_POOLSIZE)
for dist in pool.imap_unordered(latest_infos, packages):
if dist is not None:
yield dist
I don't think we need to close/join the pool though, since after the iterations all jobs are finished. |
Even though the jobs are done it's better to clean all resources. For example there are extra threads which may become zombie and unaccessible. Also there are not only worker threads but some extra utility threads. And I had one test failed without this closing code. |
I'm concerned about the complexity vs benefit tradeoff here. Adding multithreaded code adds a significant possibility for extra failures/issues, and you haven't really given any details as to what the time savings are like. Can you give some examples of before/after timings (preferably with examples that others could reproduce)? I have never personally found What do other @pypa/pip-committers think? |
@pfmoore Here is reproducible MINIMAL example. You need
Cleaning:
Here we have only about 67 packages installed with just a few popular Python packages and this fix already saves 2 seconds of real time. In Data Science problems I usually have at least 100 packages installed and I get about two times boost with this fix. Also I don't see any problem in multi-threading programs especially parts with requesting something from the Internet. |
Also I believe that all performance fixes should be documented right in the code via comment. So if it's going to be decided that performance is more important in this performance vs simplicity tradeoff then it's worth to sum up @CrafterKolyan benchmarks and write them into comment (or at least in commit message) |
@nikitabobko I've added comment in code and also created an issue with example which is referenced in pull request name and eventually will be in commit name. |
I can also confirm the speedup on my Debian testing system with 286 Python packages installed: On master
On this branch
@CrafterKolyan, could you please switch back to |
This will need a test to ensure the indentation thread safty thing is not accidentally broken in the future. |
@uranusjr added two tests. Actually they are more about working the same in created threads as in main thread of |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
With my 20.1 release manager hat on, this can be included in this month's release, if once the news fragment is updated.
I'll wait for a few days to let other @pypa/pip-committers chime in. :)
pip list --outdated
Co-Authored-By: Pradyun Gedam <[email protected]>
Thanks @CrafterKolyan for the PR and for the prompt responses to review comments. :) |
Co-Authored-By: Pradyun Gedam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally confident that pip's codebase is thread safe but this seems simple enough 👍
Yea I'm OK because this seems to not be touching anything other than logging and the specific code at hand. |
Looks like at least our download progress indicator isn't thread safe. https://github.com/pradyunsg/pip/pull/5/checks?check_run_id=579142888#step:6:2757 |
But this code doesn't use any progress bars |
Understood. But you had to patch logging to fix thread safety. This PR could mean that future changes (for example, adding a progress bar to pip list!) need to do further thread safety patching in the same way. The fact is, that we don't guarantee pip's code is thread safe. Thread safety has always been an unnecessary maintenance burden to require, given that pip doesn't use threads itself, This PR changes that, meaning that we now do have to guarantee thread safety, at least for the parts of pip's code called by this command. As I said above, I'm concerned about the cost vs benefits of this change. It seems to me that we could be taking on a non-trivial maintenance cost to fix an issue that doesn't appear to be more than an inconvenience to a relatively small number of users. |
My changes are not about thread safety but about not throwing exception that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CrafterKolyan, I think once upon a time pip planned to be multithreading, but then other works got in the way and such speed enhancement is not yet implemented. BTW could you please remove _log_state.indentation = 0
since IMHO it is a bit misleading?
@pfmoore, with GH-825 under consideration, I think we should gradually prepare the codebase to be thread-safe, since Python 2 support AFAIK is not going to be dropped any time soon so asyncio
is not a immediate option. I applied for GSoC with parallel download as the goal and would try to make the display-relevant part thread-safe this summer, if you pip fellows accept the proposal. That being said, there are considerations over the cost of such change and I would love to be given it a try. I cannot be 100% sure that I will stick around forever to be responsible of the thread-safety, but if the project succeed, it's very likely that I will have a stronger bond with pip and be able to help for a long-term.
The above is just my opinion and I hope you can make a decision on this matter in the way that works out best for both the maintainers and the users.
@pfmoore's concerns are reasonable and appropriate. Thinking about this a bunch more, I personally still feel that the benefits here could be worth it... at least enough to make it worth giving this a shot. I think this will be a reasonable thing to include in pip 20.1-beta1 and to see if we get any user reports of problems due to this. That gives us the opportunity to communicate "hey, this is the first instance of pip actually utilizing parallelization within itself, so there may be rough edges". If we don't have any user reports of issues due to this, that's great. If we do, we'd treat it like any other regression in a new functionality -- if it's bad-enough, we'll revert the change and then decide if/how to reintroduce it. |
+1 on the plan to have it available in 20.1 beta, and see if we get any issues in the wild. |
Awesome. Looks like everyone is on board with the plan, so I'll go ahead and merge this in. :) |
Thanks for the PR @CrafterKolyan! ^>^ |
118: Update pip to 20.1 r=duckinator a=pyup-bot This PR updates [pip](https://pypi.org/project/pip) from **20.0.2** to **20.1**. <details> <summary>Changelog</summary> ### 20.1 ``` ================= Process ------- - Document that pip 21.0 will drop support for Python 2.7. Features -------- - Add ``pip cache dir`` to show the cache directory. (`7350 <https://github.com/pypa/pip/issues/7350>`_) Bug Fixes --------- - Abort pip cache commands early when cache is disabled. (`8124 <https://github.com/pypa/pip/issues/8124>`_) - Correctly set permissions on metadata files during wheel installation, to permit non-privileged users to read from system site-packages. (`8139 <https://github.com/pypa/pip/issues/8139>`_) ``` ### 20.1b1 ``` =================== Deprecations and Removals ------------------------- - Remove emails from AUTHORS.txt to prevent usage for spamming, and only populate names in AUTHORS.txt at time of release (`5979 <https://github.com/pypa/pip/issues/5979>`_) - Remove deprecated ``--skip-requirements-regex`` option. (`7297 <https://github.com/pypa/pip/issues/7297>`_) - Building of local directories is now done in place, instead of a temporary location containing a copy of the directory tree. (`7555 <https://github.com/pypa/pip/issues/7555>`_) - Remove unused ``tests/scripts/test_all_pip.py`` test script and the ``tests/scripts`` folder. (`7680 <https://github.com/pypa/pip/issues/7680>`_) Features -------- - pip now implements PEP 610, so ``pip freeze`` has better fidelity in presence of distributions installed from Direct URL requirements. (`609 <https://github.com/pypa/pip/issues/609>`_) - Add ``pip cache`` command for inspecting/managing pip's wheel cache. (`6391 <https://github.com/pypa/pip/issues/6391>`_) - Raise error if ``--user`` and ``--target`` are used together in ``pip install`` (`7249 <https://github.com/pypa/pip/issues/7249>`_) - Significantly improve performance when ``--find-links`` points to a very large HTML page. (`7729 <https://github.com/pypa/pip/issues/7729>`_) - Indicate when wheel building is skipped, due to lack of the ``wheel`` package. (`7768 <https://github.com/pypa/pip/issues/7768>`_) - Change default behaviour to always cache responses from trusted-host source. (`7847 <https://github.com/pypa/pip/issues/7847>`_) - An alpha version of a new resolver is available via ``--unstable-feature=resolver``. (`988 <https://github.com/pypa/pip/issues/988>`_) Bug Fixes --------- - Correctly freeze a VCS editable package when it is nested inside another VCS repository. (`3988 <https://github.com/pypa/pip/issues/3988>`_) - Correctly handle ``%2F`` in URL parameters to avoid accidentally unescape them into ``/``. (`6446 <https://github.com/pypa/pip/issues/6446>`_) - Reject VCS URLs with an empty revision. (`7402 <https://github.com/pypa/pip/issues/7402>`_) - Warn when an invalid URL is passed with ``--index-url`` (`7430 <https://github.com/pypa/pip/issues/7430>`_) - Use better mechanism for handling temporary files, when recording metadata about installed files (RECORD) and the installer (INSTALLER). (`7699 <https://github.com/pypa/pip/issues/7699>`_) - Correctly detect global site-packages availability of virtual environments created by PyPA’s virtualenv>=20.0. (`7718 <https://github.com/pypa/pip/issues/7718>`_) - Remove current directory from ``sys.path`` when invoked as ``python -m pip <command>`` (`7731 <https://github.com/pypa/pip/issues/7731>`_) - Stop failing uninstallation, when trying to remove non-existent files. (`7856 <https://github.com/pypa/pip/issues/7856>`_) - Prevent an infinite recursion with ``pip wheel`` when ``$TMPDIR`` is within the source directory. (`7872 <https://github.com/pypa/pip/issues/7872>`_) - Significantly speedup ``pip list --outdated`` by parallelizing index interaction. (`7962 <https://github.com/pypa/pip/issues/7962>`_) - Improve Windows compatibility when detecting writability in folder. (`8013 <https://github.com/pypa/pip/issues/8013>`_) Vendored Libraries ------------------ - Update semi-supported debundling script to reflect that appdirs is vendored. - Add ResolveLib as a vendored dependency. - Upgrade certifi to 2020.04.05.1 - Upgrade contextlib2 to 0.6.0.post1 - Upgrade distro to 1.5.0. - Upgrade idna to 2.9. - Upgrade msgpack to 1.0.0. - Upgrade packaging to 20.3. - Upgrade pep517 to 0.8.2. - Upgrade pyparsing to 2.4.7. - Remove pytoml as a vendored dependency. - Upgrade requests to 2.23.0. - Add toml as a vendored dependency. - Upgrade urllib3 to 1.25.8. Improved Documentation ---------------------- - Emphasize that VCS URLs using git, git+git and git+http are insecure due to lack of authentication and encryption (`1983 <https://github.com/pypa/pip/issues/1983>`_) - Clarify the usage of --no-binary command. (`3191 <https://github.com/pypa/pip/issues/3191>`_) - Clarify the usage of freeze command in the example of Using pip in your program (`7008 <https://github.com/pypa/pip/issues/7008>`_) - Add a "Copyright" page. (`7767 <https://github.com/pypa/pip/issues/7767>`_) - Added example of defining multiple values for options which support them (`7803 <https://github.com/pypa/pip/issues/7803>`_) ``` </details> <details> <summary>Links</summary> - PyPI: https://pypi.org/project/pip - Changelog: https://pyup.io/changelogs/pip/ - Homepage: https://pip.pypa.io/ </details> Co-authored-by: pyup-bot <[email protected]>
Fixes #7964
This thing is working way too slow now. The problem is that it fetches all versions in one thread.