Disable multiprocessing parallelism in unit tests

This commit disables the multiprocessing based parallelism when running unittest jobs in CI. We historically have defaulted the use of multiprocessing in environments only where the "fork" start method is available because this has the best performance and has no caveats around how it is used by users (you don't need an `if __name__ == "__main__"` guard). However, the use of the "fork" method isn't always 100% reliable (see https://bugs.python.org/issue40379), which we saw on Python 3.9 Qiskit#6188. In unittest CI (and tox) by default we use stestr which spawns (not using fork) parallel workers to run tests in parallel. With this PR this means in unittest we're now running multiple test runner subprocesses, which are executing parallel dispatched code using multiprocessing's fork start method, which is executing multithreaded rust code. This three layers of nesting is fairly reliably hanging as Python's fork doesn't seem to be able to handle this many layers of nested parallelism. There are 2 ways I've been able to fix this, the first is to change the start method used by `parallel_map()` to either "spawn" or "forkserver" either of these does not suffer from random hanging. However, doing this in the unittest context causes significant overhead and slows down test executing significantly. The other is to just disable the multiprocessing which fixes the hanging and doesn't impact runtime performance signifcantly (and might actually help in CI so we're not oversubscribing the limited resources. As I have not been able to reproduce `parallel_map()` hanging in a standalone context with multithreaded stochastic swap this commit opts for just disabling multiprocessing in CI and documenting the known issue in the release notes as this is the simpler solution. It's unlikely that users will nest parallel processes as it typically hurts performance (and parallel_map() actively guards against it), we only did it in testing previously because the tests which relied on it were a small portion of the test suite (roughly 65 tests) and typically did not have a significant impact on the total throughput of the test suite.
mtreinish · Feb 19, 2022 · 93ded7f · 93ded7f
1 parent 2588837
commit 93ded7f
Show file tree

Hide file tree

Showing 3 changed files with 23 additions and 0 deletions.
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -195,6 +195,8 @@ stages:
             stestr run
             popd
           displayName: 'Run tests'
+          env:
+            QISKIT_PARALLEL: FALSE
         - task: CopyFiles@2
           condition: failed()
           displayName: 'Copy images'
@@ -386,6 +388,8 @@ stages:
             echo "PYTHONHASHSEED=$PYTHONHASHSEED"
             stestr run
           displayName: 'Run tests'
+          env:
+            QISKIT_PARALLEL: FALSE
         - task: CopyFiles@2
           condition: failed()
           displayName: 'Copy images'
@@ -632,6 +636,8 @@ stages:
             echo "PYTHONHASHSEED=$PYTHONHASHSEED"
             stestr run
           displayName: 'Run tests'
+          env:
+            QISKIT_PARALLEL=FALSE
         - task: CopyFiles@2
           condition: failed()
           displayName: 'Copy images'

diff --git a/releasenotes/notes/multithreaded-stochastic-swap-6c2f13d7bd566284.yaml b/releasenotes/notes/multithreaded-stochastic-swap-6c2f13d7bd566284.yaml
@@ -25,3 +25,19 @@ upgrade:
     :class:`.StochasticSwap` transpiler pass which greatly improves the runtime
     performance of the transpiler. The rust compiler can easily be installed
     using rustup, which can be found here: https://rustup.rs/
+issues:
+  - |
+    When running :func:`.parallel_map` (which is done internally by
+    performance sensitive functions such as :func:`.transpile` and
+    :func:`.assemble`) in a subprocess launched outside of
+    :func:`.parallel_map` it is possible that the parallel dispatch performed
+    inside :func:`.parallel_map` will hang and never return.
+    This is due to upstream issues in cpython (see:
+    https://bugs.python.org/issue40379 for more details) around the default
+    method to launch subprocesses on Linux and macOS (with Python 3.7). If you
+    encounter this you have two options you can either remove the nested
+    parallel processes as calling :func:`.parallel_map` from a main process
+    should work fine or you can manually call the cPython standard library
+    ``multiprocessing`` module to perform similar parallel dispatch from a
+    subprocess but use the ``"spawn"`` or ``"forkserver"`` launch methods to
+    avoid the potential to have things get stuck and never return.
diff --git a/tox.ini b/tox.ini
@@ -14,6 +14,7 @@ setenv =
   ARGS="-V"
   QISKIT_SUPRESS_PACKAGING_WARNINGS=Y
   QISKIT_TEST_CAPTURE_STREAMS=1
+  QISKIT_PARALLEL=FALSE
 deps = -r{toxinidir}/requirements.txt
        -r{toxinidir}/requirements-dev.txt
 commands =