Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy is imported unconditionally #5729

Closed
crusaderky opened this issue Jan 28, 2022 · 5 comments · Fixed by #5862
Closed

numpy is imported unconditionally #5729

crusaderky opened this issue Jan 28, 2022 · 5 comments · Fixed by #5862

Comments

@crusaderky
Copy link
Collaborator

crusaderky commented Jan 28, 2022

We have many unit tests that generate Nannies with spawn method. In theory, those tests should become a lot faster if numpy was imported only if needed.
This is not the case today, because of

  1. two places that import pandas; see blockers:
  2. two import blosc statements in distributed.protocol.compression

xfailed unit test is already in #5695.

Note that this means that fixing this issue means that, across all unit tests that do use numpy data on nannies, import numpy will be moved from @gen_cluster start to the first time where a numpy array or function is deserialised - which in turn may cause flakiness if the test time margins are too tight.

CC @gjoseph92

@gjoseph92
Copy link
Collaborator

Yes, I also just looked into the same thing. Maybe we should provide a helper in #5724 for checking if a package exists (via importlib.metadata/pkg_resources) without actually importing it, so we can use that on blosc?
Also xref #5269.

@crusaderky
Copy link
Collaborator Author

Maybe we should provide a helper in #5724 for checking if a package exists (via importlib.metadata/pkg_resources) without actually importing it, so we can use that on blosc?

Not that simple - distributed.protocol.compression fills up a dict of compress/decompress callables at module import time, so we'd need to be a bit more creative.

@gjoseph92
Copy link
Collaborator

Blosc/python-blosc#159 seems stalled, but we could probably suggest just not importing the tests in top-level __init__.py. I'm not sure this is worth worrying about too much right now beyond trying to get an upstream change?

@crusaderky
Copy link
Collaborator Author

WON'T FIX

#5750 introduces a new unconditional import of numpy, whose benefits definitely exceed those of this ticket.

@jakirkham
Copy link
Member

If you can think of better ways to handle that import, am open to other options 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants