Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

Closed
jbweston opened this issue Apr 6, 2020 · 13 comments · Fixed by #655

Comments

@jbweston
Copy link
Member

jbweston commented Apr 6, 2020

Steps to reproduce

Create a single-celled notebook with the following contents:

from concurrent.futures import ProcessPoolExecutor

executor = ProcessPoolExecutor()

def just_print(_):
    print("Listen!")
    
executor.map(just_print, [None] * 1000)

Running this cell will result in "Listen!" being output into the browser 1000 times.

Run the cell a few times (the error is non-deterministic) and keep an eye on the log from the NotebookApp. You should eventually see a traceback similar to the following:

[E 18:17:17.090 NotebookApp] Exception in callback functools.partial(<function ZMQStream._update_handler.<locals>.<lambda> at 0x7f72c6cf1280>)
    Traceback (most recent call last):
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 542, in <lambda>
        self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 456, in _handle_events
        self._handle_recv()
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 486, in _handle_recv
        self._run_callback(callback, msg)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 438, in _run_callback
        callback(*args, **kwargs)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/notebook/services/kernels/kernelmanager.py", line 412, in record_activity
        msg = session.deserialize(fed_msg_list)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/jupyter_client/session.py", line 927, in deserialize
        raise ValueError("Duplicate Signature: %r" % signature)
    ValueError: Duplicate Signature: b'6876c7355d3b1f325c2a8e39c754c1946120438a9e4d81c113c0397edb65e84d'

Environment

OS: WSL2 running Ubuntu 18.04 (Also been seen on other flavours of Linux and not under WSL2)
jupyter --version

jupyter core     : 4.6.1
jupyter-notebook : 6.0.3
qtconsole        : 4.6.0
ipython          : 7.12.0
ipykernel        : 5.1.4
jupyter client   : 5.3.4
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 5.0.4
traitlets        : 4.3.3

conda env export

  - pygments=2.5.2=py_0
  - pylint=2.4.4=py37_0
  - pyopenssl=19.0.0=py37_0
  - pyqt=5.9.2=py37h05f1152_2
  - pyrsistent=0.15.7=py37h7b6447c_0
  - pysocks=1.7.1=py37_0
  - python=3.7.4=h265db76_1
  - python-dateutil=2.8.1=py_0
  - python-jsonrpc-server=0.3.4=py_0
  - python-language-server=0.31.7=py37_0
  - pyzmq=18.1.1=py37he6710b0_0
  - qt=5.9.7=h5867ecd_1
  - qtconsole=4.6.0=py_1
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_0
  - rope=0.16.0=py_0
  - ruamel_yaml=0.15.46=py37h14c3975_0
  - send2trash=1.5.0=py37_0
  - setuptools=41.4.0=py37_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.12.0=py37_0
  - snowballstemmer=2.0.0=py_0
  - sqlite=3.30.0=h7b6447c_0
  - terminado=0.8.3=py37_0
  - testpath=0.4.4=py_0
  - tk=8.6.8=hbc83047_0
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.36.1=py_0
  - traitlets=4.3.3=py37_0
  - ujson=1.35=py37h14c3975_0
  - urllib3=1.24.2=py37_0
  - wcwidth=0.1.8=py_0
  - webencodings=0.5.1=py37_1
  - wheel=0.33.6=py37_0
  - widgetsnbextension=3.5.1=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - yapf=0.28.0=py_0
  - zeromq=4.3.1=he6710b0_3
  - zipp=2.2.0=py_0
  - zlib=1.2.11=h7b6447c_3

Further information

Possibly related to #498, as ProcessPoolExecutor uses multiprocessing under the hood.

I think the most likely explanation is that the ProcessPoolExecutor is using fork mode for starting the subprocesses, which means that the subprocesses are using the same session key as the parent, which means that the signatures will sometimes clash.

@jbweston
Copy link
Member Author

jbweston commented Apr 6, 2020

I can confirm that the error goes away when I make the following changes:

  1. Move the function to be run to a separate file external.py:
def just_print(_):
    print("Listen!")
  1. change the notebook code to:
from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp

from external import just_print

executor = ProcessPoolExecutor(mp_context=mp.get_context("spawn"))

list(executor.map(just_print, [None] * 1000))

The cell no longer prints anything, and I cannot reproduce the error.

This makes sense, as when using spawn the subprocesses do not inherit any of the parent's resources.

@davidbrochart
Copy link
Member

Yes, I was about to suggest the spawn start method as well. If that's acceptable (you may have to do extra work to initialize your workers), then you have a fix.

@jbweston
Copy link
Member Author

jbweston commented Apr 6, 2020

Indeed this is a fix, but it makes using ProcessPoolExecutor from the notebook a bit less ergonomic (I am forced to split my code between the notebook and an importable module).

@davidbrochart
Copy link
Member

Yes, that's what I meant by extra work for initialization. On the other hand, you may end up with less memory consumption, if that was a concern.

@MSeal
Copy link
Contributor

MSeal commented Apr 6, 2020

You can copy https://github.com/nteract/scrapbook/blob/master/scrapbook/utils.py#L46-L57 to determine if you're in a kernel context or not and set the argument JIT before you call ProcessPoolExecutor. That way you don't need to make separate functions and can detect the right settings for the context.

@MSeal
Copy link
Contributor

MSeal commented Apr 6, 2020

Unfortunately concurrency controls inside concurrent processes break some black box boundaries (both in and out of python). Async also has to take special steps based on the parent execution context, and threading mixed with multiprocessing can just fail straight out with horrible low level errors. Not sure jupyter_client itself can do anything better here to solve it :/

@jbweston
Copy link
Member Author

jbweston commented Apr 7, 2020

@davidbrochart @MSeal thanks for the quick response!

I understand that using "spawn" or "forkserver" mode would be a proximal fix for the problem, however it does not satisfy other constraints that I have.

For my use-case we make use of the fact that the subprocesses are forked to allow us to do all the required setup directly within the notebook, rather than doing everything in separate modules and later importing those modules. Specifically, this means that it is possible to submit or map functions that are defined directly within the notebook, and we do not need to concern ourselves about whether these functions are closures or not; it "just works".

Unfortunately one of the design constraints that I am working with is that this has to be possible, even if it can in principle lead to complete insanity (e.g. if the functions make use of mutable global state).

@jbweston
Copy link
Member Author

jbweston commented Apr 7, 2020

I just learned about the register_at_fork function from the os module, and it made me think that a more distal fix could be to register a handler that resets the jupyter client session IDs in the child processes.

A couple of questions about this:

  1. Will it "just work", or am I missing something?
  2. Could such a thing in principle be contributed to jupyter_client directly, or does this fall in the realm of outrageous and brittle hack that I would have to put in my own code?

@alasla
Copy link

alasla commented Dec 22, 2020

@jbweston did you ever come to a conclusion on how you handle this (even if it's something hacky)? I've a similar situation where switching to spawn would be difficult/expensive.

@ykchong45
Copy link

Not sure if this helps, but joblib.Parallel can call functions in the notebook, so you don't have to create a separate file for the function.

You can call the function by

from joblib import Parallel, delayed

res = Parallel(n_jobs=-1)(delayed(funcName)(i, j) for i, j in paramsList)

mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Feb 17, 2021
We're histting a relatively high tutorial job failure rate caused by
jupyter/jupyter_client#541 on certain notebooks we run in the tutorials
job. This appears to be caused by multiprocessing usage inside the
qiskit that print to stdout while running. To try and avoid this issue,
this commit disables parallelism in qiskit by setting the env var to do
that. The only concern is whether we have sufficient time budget to
execute the notebooks in CI.
mtreinish added a commit to Qiskit/qiskit that referenced this issue Feb 17, 2021
We're histting a relatively high tutorial job failure rate caused by
jupyter/jupyter_client#541 on certain notebooks we run in the tutorials
job. This appears to be caused by multiprocessing usage inside the
qiskit that print to stdout while running. To try and avoid this issue,
this commit disables parallelism in qiskit by setting the env var to do
that. The only concern is whether we have sufficient time budget to
execute the notebooks in CI.
@takluyver
Copy link
Member

I think this was introduced by #493, which optimised msg_id to be a fixed ID plus a counter, rather than a totally random ID every time. Obviously that means that after a fork, processes will produce messages with the same IDs. I believe this is a bug (the messaging spec says that msg_id "must be unique per message"), although I'm not sure if the spec really anticipates that 'the kernel' could consist of several forked processes at all.

I think resetting the session ID (different from the session key, which has to stay the same), as @jbweston suggested, would work. But a simpler approach might be to include the process ID in the message ID. At least on my system, os.getpid() appears to be cheap to call (~650 ns).

The race condition is probably because there's a timestamp in the message header, so you only get the same signature if the same data is sent twice at the same time (microsecond precision, or as close to that as the system actually gives us).

@davidbrochart
Copy link
Member

+1 on including the process ID in the message ID.

takluyver added a commit to takluyver/jupyter_client that referenced this issue Jun 7, 2021
Ensures messages are unique after fork

Closes jupytergh-541
@takluyver
Copy link
Member

I've had a go at that in #655.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants