"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

jbweston · 2020-04-06T19:58:50Z

Steps to reproduce

Create a single-celled notebook with the following contents:

from concurrent.futures import ProcessPoolExecutor

executor = ProcessPoolExecutor()

def just_print(_):
    print("Listen!")
    
executor.map(just_print, [None] * 1000)

Running this cell will result in "Listen!" being output into the browser 1000 times.

Run the cell a few times (the error is non-deterministic) and keep an eye on the log from the NotebookApp. You should eventually see a traceback similar to the following:

[E 18:17:17.090 NotebookApp] Exception in callback functools.partial(<function ZMQStream._update_handler.<locals>.<lambda> at 0x7f72c6cf1280>)
    Traceback (most recent call last):
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 542, in <lambda>
        self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 456, in _handle_events
        self._handle_recv()
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 486, in _handle_recv
        self._run_callback(callback, msg)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 438, in _run_callback
        callback(*args, **kwargs)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/notebook/services/kernels/kernelmanager.py", line 412, in record_activity
        msg = session.deserialize(fed_msg_list)
      File "/home/jbweston/.miniconda/envs/jupyter-bug/lib/python3.8/site-packages/jupyter_client/session.py", line 927, in deserialize
        raise ValueError("Duplicate Signature: %r" % signature)
    ValueError: Duplicate Signature: b'6876c7355d3b1f325c2a8e39c754c1946120438a9e4d81c113c0397edb65e84d'

Environment

OS: WSL2 running Ubuntu 18.04 (Also been seen on other flavours of Linux and not under WSL2)
jupyter --version

jupyter core     : 4.6.1
jupyter-notebook : 6.0.3
qtconsole        : 4.6.0
ipython          : 7.12.0
ipykernel        : 5.1.4
jupyter client   : 5.3.4
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 5.0.4
traitlets        : 4.3.3

conda env export

  - pygments=2.5.2=py_0
  - pylint=2.4.4=py37_0
  - pyopenssl=19.0.0=py37_0
  - pyqt=5.9.2=py37h05f1152_2
  - pyrsistent=0.15.7=py37h7b6447c_0
  - pysocks=1.7.1=py37_0
  - python=3.7.4=h265db76_1
  - python-dateutil=2.8.1=py_0
  - python-jsonrpc-server=0.3.4=py_0
  - python-language-server=0.31.7=py37_0
  - pyzmq=18.1.1=py37he6710b0_0
  - qt=5.9.7=h5867ecd_1
  - qtconsole=4.6.0=py_1
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_0
  - rope=0.16.0=py_0
  - ruamel_yaml=0.15.46=py37h14c3975_0
  - send2trash=1.5.0=py37_0
  - setuptools=41.4.0=py37_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.12.0=py37_0
  - snowballstemmer=2.0.0=py_0
  - sqlite=3.30.0=h7b6447c_0
  - terminado=0.8.3=py37_0
  - testpath=0.4.4=py_0
  - tk=8.6.8=hbc83047_0
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.36.1=py_0
  - traitlets=4.3.3=py37_0
  - ujson=1.35=py37h14c3975_0
  - urllib3=1.24.2=py37_0
  - wcwidth=0.1.8=py_0
  - webencodings=0.5.1=py37_1
  - wheel=0.33.6=py37_0
  - widgetsnbextension=3.5.1=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - yapf=0.28.0=py_0
  - zeromq=4.3.1=he6710b0_3
  - zipp=2.2.0=py_0
  - zlib=1.2.11=h7b6447c_3

Further information

Possibly related to #498, as ProcessPoolExecutor uses multiprocessing under the hood.

I think the most likely explanation is that the ProcessPoolExecutor is using fork mode for starting the subprocesses, which means that the subprocesses are using the same session key as the parent, which means that the signatures will sometimes clash.

The text was updated successfully, but these errors were encountered:

jbweston · 2020-04-06T20:06:59Z

I can confirm that the error goes away when I make the following changes:

Move the function to be run to a separate file external.py:

def just_print(_):
    print("Listen!")

change the notebook code to:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp

from external import just_print

executor = ProcessPoolExecutor(mp_context=mp.get_context("spawn"))

list(executor.map(just_print, [None] * 1000))

The cell no longer prints anything, and I cannot reproduce the error.

This makes sense, as when using spawn the subprocesses do not inherit any of the parent's resources.

davidbrochart · 2020-04-06T20:13:46Z

Yes, I was about to suggest the spawn start method as well. If that's acceptable (you may have to do extra work to initialize your workers), then you have a fix.

jbweston · 2020-04-06T20:16:51Z

Indeed this is a fix, but it makes using ProcessPoolExecutor from the notebook a bit less ergonomic (I am forced to split my code between the notebook and an importable module).

davidbrochart · 2020-04-06T20:27:27Z

Yes, that's what I meant by extra work for initialization. On the other hand, you may end up with less memory consumption, if that was a concern.

MSeal · 2020-04-06T20:47:44Z

You can copy https://github.com/nteract/scrapbook/blob/master/scrapbook/utils.py#L46-L57 to determine if you're in a kernel context or not and set the argument JIT before you call ProcessPoolExecutor. That way you don't need to make separate functions and can detect the right settings for the context.

MSeal · 2020-04-06T20:49:57Z

Unfortunately concurrency controls inside concurrent processes break some black box boundaries (both in and out of python). Async also has to take special steps based on the parent execution context, and threading mixed with multiprocessing can just fail straight out with horrible low level errors. Not sure jupyter_client itself can do anything better here to solve it :/

jbweston · 2020-04-07T19:50:19Z

@davidbrochart @MSeal thanks for the quick response!

I understand that using "spawn" or "forkserver" mode would be a proximal fix for the problem, however it does not satisfy other constraints that I have.

For my use-case we make use of the fact that the subprocesses are forked to allow us to do all the required setup directly within the notebook, rather than doing everything in separate modules and later importing those modules. Specifically, this means that it is possible to submit or map functions that are defined directly within the notebook, and we do not need to concern ourselves about whether these functions are closures or not; it "just works".

Unfortunately one of the design constraints that I am working with is that this has to be possible, even if it can in principle lead to complete insanity (e.g. if the functions make use of mutable global state).

jbweston · 2020-04-07T19:52:52Z

I just learned about the register_at_fork function from the os module, and it made me think that a more distal fix could be to register a handler that resets the jupyter client session IDs in the child processes.

A couple of questions about this:

Will it "just work", or am I missing something?
Could such a thing in principle be contributed to jupyter_client directly, or does this fall in the realm of outrageous and brittle hack that I would have to put in my own code?

alasla · 2020-12-22T18:48:56Z

@jbweston did you ever come to a conclusion on how you handle this (even if it's something hacky)? I've a similar situation where switching to spawn would be difficult/expensive.

ykchong45 · 2021-01-26T06:30:11Z

Not sure if this helps, but joblib.Parallel can call functions in the notebook, so you don't have to create a separate file for the function.

You can call the function by

from joblib import Parallel, delayed

res = Parallel(n_jobs=-1)(delayed(funcName)(i, j) for i, j in paramsList)

We're histting a relatively high tutorial job failure rate caused by jupyter/jupyter_client#541 on certain notebooks we run in the tutorials job. This appears to be caused by multiprocessing usage inside the qiskit that print to stdout while running. To try and avoid this issue, this commit disables parallelism in qiskit by setting the env var to do that. The only concern is whether we have sufficient time budget to execute the notebooks in CI.

takluyver · 2021-06-07T08:57:43Z

I think this was introduced by #493, which optimised msg_id to be a fixed ID plus a counter, rather than a totally random ID every time. Obviously that means that after a fork, processes will produce messages with the same IDs. I believe this is a bug (the messaging spec says that msg_id "must be unique per message"), although I'm not sure if the spec really anticipates that 'the kernel' could consist of several forked processes at all.

I think resetting the session ID (different from the session key, which has to stay the same), as @jbweston suggested, would work. But a simpler approach might be to include the process ID in the message ID. At least on my system, os.getpid() appears to be cheap to call (~650 ns).

The race condition is probably because there's a timestamp in the message header, so you only get the same signature if the same data is sent twice at the same time (microsecond precision, or as close to that as the system actually gives us).

davidbrochart · 2021-06-07T09:15:19Z

+1 on including the process ID in the message ID.

Ensures messages are unique after fork Closes jupytergh-541

takluyver · 2021-06-07T09:27:24Z

I've had a go at that in #655.

bhilbert4 mentioned this issue Jan 27, 2021

Notebook error when calling mutliprocessing spacetelescope/mirage#614

Open

mtreinish mentioned this issue Feb 17, 2021

Try disabling parallelism on tutorials job Qiskit/qiskit#5866

Merged

manoelmarques mentioned this issue Feb 17, 2021

Disable parallelism on tutorials job, refactor CI qiskit-community/qiskit-nature#56

Merged

dalthviz mentioned this issue Feb 17, 2021

Spyder has encountered an internal problem spyder-ide/spyder#14773

Closed

This was referenced Feb 17, 2021

Disable parallelism on tutorials job, refactor CI qiskit-community/qiskit-finance#10

Merged

Disable parallelism on tutorials job qiskit-community/qiskit-optimization#32

Merged

Disable parallelism on tutorials job qiskit-community/qiskit-aqua#1537

Merged

takluyver added a commit to takluyver/jupyter_client that referenced this issue Jun 7, 2021

Include process ID in message ID

9ac65f9

Ensures messages are unique after fork Closes jupytergh-541

takluyver mentioned this issue Jun 7, 2021

Include process ID in message ID #655

Merged

davidbrochart closed this as completed in #655 Jun 7, 2021

takluyver mentioned this issue Jul 1, 2022

Control + C on a program using multiprocessing.pool #498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

jbweston commented Apr 6, 2020 •

edited

Loading

jbweston commented Apr 6, 2020 •

edited

Loading

davidbrochart commented Apr 6, 2020

jbweston commented Apr 6, 2020

davidbrochart commented Apr 6, 2020

MSeal commented Apr 6, 2020

MSeal commented Apr 6, 2020

jbweston commented Apr 7, 2020

jbweston commented Apr 7, 2020 •

edited

Loading

alasla commented Dec 22, 2020

ykchong45 commented Jan 26, 2021

takluyver commented Jun 7, 2021

davidbrochart commented Jun 7, 2021

takluyver commented Jun 7, 2021

"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

"Duplicate Signature" error when writing to stdout using concurrent.futures.ProcessPool executor #541

Comments

jbweston commented Apr 6, 2020 • edited Loading

Steps to reproduce

Environment

Further information

jbweston commented Apr 6, 2020 • edited Loading

davidbrochart commented Apr 6, 2020

jbweston commented Apr 6, 2020

davidbrochart commented Apr 6, 2020

MSeal commented Apr 6, 2020

MSeal commented Apr 6, 2020

jbweston commented Apr 7, 2020

jbweston commented Apr 7, 2020 • edited Loading

alasla commented Dec 22, 2020

ykchong45 commented Jan 26, 2021

takluyver commented Jun 7, 2021

davidbrochart commented Jun 7, 2021

takluyver commented Jun 7, 2021

jbweston commented Apr 6, 2020 •

edited

Loading

jbweston commented Apr 6, 2020 •

edited

Loading

jbweston commented Apr 7, 2020 •

edited

Loading