Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poetry install hugely slowed down by deepcopy operations #7459

Open
4 tasks done
ralbertazzi opened this issue Feb 2, 2023 · 7 comments
Open
4 tasks done

poetry install hugely slowed down by deepcopy operations #7459

ralbertazzi opened this issue Feb 2, 2023 · 7 comments
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@ralbertazzi
Copy link
Contributor

ralbertazzi commented Feb 2, 2023

  • Poetry version: 1.3.2
  • Python version: 3.9.11
  • OS version and name: MacOS 12.5.1
  • pyproject.toml: not shared
  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

poetry install takes considerable time (70 seconds) to run what should be a no-op. In fact:

  • All packages to be installed are already cached
  • All packages are already installed in the virtual environment. The command terminates with No dependencies to install or update

Although I cannot share the pyproject.toml and poetry.lock files as they contain private dependencies (so you wouldn't be able to reproduce it), I can share a profiling of poetry that obtained running cProfile (link here). I'll share a few links that obtained plotting with snakeviz the result of

/path/to/pypoetry/venv/bin/python -m cProfile -o test.profile -m poetry install

As you can see lots of time seems to be spent in deepcopy operations in poetry-core.

I'll try to create a fully reproducible use case. In the meantime I can also share that:

  • The project contains a great number of dependencies (around 200), some of them related to ML (PyTorch, TensorFlow, ..).
  • The same issue happens in our CI systems running on Linux
  • I experienced the same issue on both Poetry 1.2.2 ad Poetry 1.3.2
  • While "stuck", the poetry process runs at full 100% CPU utilization
  • We fetch all packages (including public ones) through Artifactory, a private packages repo. AFAIK Artifactory does not offer any JSON API like PyPI, which I know Poetry uses extensively

Thanks a lot for your support and for maintaining this amazing tool ❤️

Screenshot 2023-02-02 at 15 40 25

Screenshot 2023-02-02 at 15 40 41

Screenshot 2023-02-02 at 15 41 00

@ralbertazzi ralbertazzi added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Feb 2, 2023
@neersighted
Copy link
Member

The deepcopy operations are part of the solver and not an extraneous/non-core bit of Poetry. Work to improve performance is of course welcome (and you might have noticed from release notes that we have made significant strides there recently), but please note that "remove these expensive deepcopies" is probably not something that is possible with my understanding of the current architecture.

Something like "evaluate how many copies are truly necessary vs. defensive in the solver" is much more tenable, but is also much more involved.

@ralbertazzi
Copy link
Contributor Author

I managed to create a reproducible project made of public dependencies, which you can find at this gist.

Steps to reproduce:

poetry install  # will install dependencies
poetry install  # everything should be cached, instead it still takes 20 seconds

The command is still deepcopy heavy, as you can see from the cProfile output

@neersighted
Copy link
Member

Poetry is running the solver when it loads the LockfileRepository to verify the lock file is valid (e.g. doesn't have conflicting packages/will not result in an obvious nonsense solution). Invalid lock files are something that can happen e.g. from those who remove the hash and git merge so we can't not perform a solver run during install.

See #496 for more context on this.

@ralbertazzi
Copy link
Contributor Author

@neersighted I thank you for your incredible contribution to the project and recent improvements (really looking forward 1.4 btw!). As I understand that touching those deepcopies mustn't be the easiest thing ever, I was still wondering if there was some low hanging fruit since the slowdown happens when everything has already been solved.

As I was writing this comment, I received your reply above :) I'm not really a fan of supporting incorrect usages of the poetry.lock file by slowing down those who don't touch it, but I understand you somehow need to make everyone happy. Is there a way (or can we implement a simple command line option) to disable lockfile verification on poetry install? This simple option would speed up my CI pipelines by far! While I would personally disable the verification by default, I'd also be happy to just have the option to disable it when needed. Thanks a lot for your quick reply!

@neersighted
Copy link
Member

I'll leave that to @radoering and @dimbleby; however, I don't think that's something we will accept upstream. The support burden of adding a knob that says "be less correct" when many users already subvert the protections we have in place (like the hash) would be massive, and it's not something any upstream developer would use or reason in terms of.

@dimbleby
Copy link
Contributor

dimbleby commented Feb 2, 2023

re-solving is baked in to how poetry install works, it's not a thing that can just be disabled.

In some sense it's best not to think of poetry.lock as a lockfile at all, it's more like a (greatly) cut-down copy of a repository.

poetry install can't just read the answer out of poetry.lock: because that file simply doesn't contain that answer. Rather poetry install must freshly calculate a solution that's suitable for the current environment, using only the packages described by poetry.lock.

Normally speaking this is fast because there are so few choices for each package in poetry.lock that there's approximately no branching at all.

Of course merge requests bringing performance improvements are welcome... While I don't think there's enormous amounts of low-hanging fruit, I would bet that a sufficiently motivated person could make meaningful gains - and I could absolutely believe that all the deepcopy is a candidate for that sort of work.

@Bobronium
Copy link

Bobronium commented Feb 2, 2023

This might be relevant: I'm writing deepcopy replacement which is currently 20-50 times faster than builtin deepcopy on subsequent copies of the same object.

If poetry workflow involves a scenario when the same object is copied multiple times, like so:

for i in range(100):
    mutate(deepcopy(data))

Replacing it with this would give you instant performance improvement:

data_copier = duper.deepdups(data)
for i in range(100):
    mutate(data_copier())

Source:
https://github.com/Bobronium/duper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
Development

No branches or pull requests

4 participants