Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pathos for multiprocessing #7002

Closed
wants to merge 10 commits into from
Closed

Use pathos for multiprocessing #7002

wants to merge 10 commits into from

Conversation

nonhermitian
Copy link
Contributor

Summary

This fixes multiprocessing issues on osx for py38 and higher. This should also allow for parallel processing on Win systems but I cannot test this, and so have left that flag to False.

Details and comments

@nonhermitian nonhermitian requested a review from a team as a code owner September 9, 2021 17:17
requirements.txt Outdated Show resolved Hide resolved
Co-authored-by: Matthew Treinish <[email protected]>
@mtreinish
Copy link
Member

I'd like to do some testing on this locally (I used to be able to reproduce the py3.9 linux hangs quite reliably), but if this can fix the issues we've had I'm definitely a fan of this. From a quick scan of the docs it also looks like it opens us up an option to distributing work between multiple machines (something like an alternative approach to #3466) which would be cool to build on. But I wasn't familiar with pathos before now so I need to do some reading :)

@nonhermitian
Copy link
Contributor Author

Ahh it looks like Pathos recycles the process ID and thus the unique names that we use in places needs to be tweaked.

@nonhermitian
Copy link
Contributor Author

Yeah the blocker is the way Schedules define their unique name. It is not unique anymore with this change, whereas circuit names still are. So I need to figure out to port the circuit unique name stuff over.

@kdk
Copy link
Member

kdk commented Sep 9, 2021

This should also allow for parallel processing on Win systems but I cannot test this, and so have left that flag to False.

Is the CI coverage we have for windows sufficient to green-light this, assuming it passes CI? If not, maybe we can reach out to some windows users for additional testing.

@nonhermitian
Copy link
Contributor Author

The issue is not working in the usual sense, but more that parallel usually causes a big slowdown on win due to the lack of a fork command. I think we need an actual win user to test serial vs parallel transpilation over several circuits to see how it pans out.

@mtreinish
Copy link
Member

The issue is not working in the usual sense, but more that parallel usually causes a big slowdown on win due to the lack of a fork command. I think we need an actual win user to test serial vs parallel transpilation over several circuits to see how it pans out.

I can test this in my windows vm tomorrow, last time I looked at it (#3547 (comment) ) there was an improvement but there were the normal spawn caveats (https://qiskit.org/documentation/release_notes.html#release-notes-0-17-0-known-issues ) that made it potentially annoying to default on, which is why we documented the issue and let users opt in.

@nonhermitian
Copy link
Contributor Author

So the issue with the unique names was the way we check if the process is main or not. I simplified this logic to just look at process IDs and all works fine now.

@nonhermitian
Copy link
Contributor Author

The one caveat here is that when defining routines in an interpreter, e.g. ipython, jupyter notebooks, custom functions to be executed in parallel must have the import statements in them. For scripts this is not the case:

This worker pool leverages the parallelpython (pp) module, and thus has many of the limitations associated with that module. The function f and the sequences in args must be serializable. The maps in this worker pool have full functionality when run from a script, but may be somewhat limited when used in the python interpreter. Both imported and interactively-defined functions in the interpreter session may fail due to the pool failing to find the source code for the target function.

https://pathos.readthedocs.io/en/latest/pathos.html#id3

@nonhermitian
Copy link
Contributor Author

Hmm, this looks like it hangs on Py38 tests in a similar spot as @mtreinish pointed out on Py39. Perhaps this is not something that actually helps much. Not sure why visualizations would cause a crash in parallel processing through.

@mtreinish
Copy link
Member

Hmm, this looks like it hangs on Py38 tests in a similar spot as @mtreinish pointed out on Py39. Perhaps this is not something that actually helps much. Not sure why visualizations would cause a crash in parallel processing through.

I actually don't think it's the visualization tests. When I looked into this before for #6188 it was getting stuck on the deserialization of sympy expressions on parallel calls. I was most commonly seeing this in the algorithms tests. But since the process hangs the other test workers continue to run until they finish running their tests

We really need to add a per test method timeout fixture to make debugging these easier. It won't work on windows (since signals behave differently there) but that's ok as long as it doesn't interrupt normal test execution.

@nonhermitian
Copy link
Contributor Author

Closing as probably will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants