Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional to the actual signal, send a message on the control port #294

Merged
merged 4 commits into from
Nov 13, 2017

Conversation

filmor
Copy link
Contributor

@filmor filmor commented Sep 22, 2017

In Erlang I have only on POSIX and only in the newest version some control over the normal runtime signals, in particular SIGINT. However, I can easily pick up and process the respective message on the control socket to interrupt execution.

Can this functionality be added? If yes, I'd continue and document the functionality and add tests. Maybe it would also be good to have the option of not sending an actual signal (via os.kill) at all.

@SpencerPark
Copy link
Contributor

I was just looking for something similar to this, specifically for interrupting the kernel. Sending something over the control channel makes complete sense for this. It is a high priority message like the shutdown request. I don't see the benefit of sending the signals over sending a control message.

Since we are at a higher level when working at the messaging level I think that the message doesn't need to be signal related anymore but rather simply an interrupt_request or similar. What do you think?

@rgbkrk
Copy link
Member

rgbkrk commented Oct 8, 2017

Cc @ivanov

@takluyver
Copy link
Member

The rationale for signals is that the kernel may block message processing while it is executing user code, so a new message - even on the control channel - will not necessarily interrupt it. This is the case in the reference Python kernel, in the R kernel, and probably in all kernels built on the Python 'wrapper kernel' machinery, e.g. the bash kernel.

We have also thought of adding messages for signals in a different context, to allow interrupting kernels running remotely. In that proposal, the frontend would send a signal message to a 'kernel nanny', a process running alongside the kernel, which would respond by sending a real Unix signal to the kernel process.

IIRC it also came up there that some kernels would rather get an interrupt message than a signal, so it sounds like this might be worth doing.

I'm not entirely convinced that sending both the signal and the message is the right thing to do. I don't yet have a concrete reason why not, but it feels a bit messy. Maybe kernels should have a way to declare whether they expect signals as real signals or as messages?

@SpencerPark
Copy link
Contributor

Fair enough, I think an opt-in for the interrupt message is a good compromise. This would be backwards compatible as well which is a big plus. A kernel.json flag seems like a good place for a kernel to make that request. I don't think it matters if the front-end acknowledges the request as currently kernels that don't handle the interrupt signal simply don't support interrupts. If a kernel requests an interrupt message as the method of communication and never receives one then the behavior is simply the same as it is currently.

My understanding of the control channel was that it should try to always be free to handle the high priority requests so when implementing the protocol I built with that in mind. It is on a separate thread and is therefore available to handle the request while the main shell is busy.

@takluyver
Copy link
Member

The way we implement the control channel is that it can pre-empt queued messages on the shell channel. So if you send 10 cells for execution and then immediately ask the kernel to shut down, it will finish executing the first cell and shutdown before starting the other 9. But only signals pre-empt something that's already running.

@filmor
Copy link
Contributor Author

filmor commented Oct 10, 2017

Is this behaviour of the control socket that a requirement, though? In Erlang I can easily interrupt and shut down execution while a cell is evaluated.

I think I agree with scoping this down to an interrupt_request instead of trying to handle arbitrary signals. However, there should still be a way to deactivate any kind of signal-use (apart from TERM). Erlang handles the situation kind-of gracefully, I think, but this is a property of the runtime that I can't necessarily influence from the kernel implementation.

Is there any scenario in which sending the interrupt_request as a message on the control socket unconditionally could break a kernel?

@takluyver
Copy link
Member

Is this behaviour of the control socket that a requirement, though?

Not particularly, or at least it's not really specified, as far as I know. I'm just explaining the background of why signals are needed. :-)

Is there any scenario in which sending the interrupt_request as a message on the control socket unconditionally could break a kernel?

I think it would be fine for a well-written kernel, but it could cause problems if a kernel author doesn't realise that they're getting both a signal and a message, and the kernel gets interrupted twice. I think it's common for kernels to print something like warning: unknown message type on unhandled messages, so if people start seeing unhandled interrupt_request messages, they're going to go and 'fix' that, even if interrupting already works with SIGINT.

Allowing the kernel to pick either messages or signals will hopefully avoid this kind of confusion.

@filmor
Copy link
Contributor Author

filmor commented Oct 18, 2017

Okay, I'll update the PR accordingly.

@filmor
Copy link
Contributor Author

filmor commented Oct 19, 2017

Updated, please have another look.

if sys.platform == 'win32':
from .win_interrupt import send_interrupt
send_interrupt(self.kernel.win32_interrupt_event)
self._send_signal_message(signal.SIGINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the _send_signal_message applies anymore. Might be left over from the initial commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed :)

@takluyver
Copy link
Member

Thanks. Since this involves a change to the message protocol, I'm going to start a discussion on the mailing list.

@lresende
Copy link
Member

@kevin-bates I want to make sure you see this, as this might be similar to what you did for remote kernel interrupt in "Enterprise Gateway"

@kevin-bates
Copy link
Member

Thanks for the heads up @lresende. Yes, I had seen this a few days ago.

The kernel launchers we use in Enterprise Gateway appear to follow the kernel nanny model described above, although our launchers actually house the target kernel.

Btw, another advantage of a message-based interrupt is that the interrupt can span userid differences (while signals do not). This is more of an issue in a gateway environment since services like JupyterHub will launch the entire notebook server as the target user (retaining userid matches between kernel managers and kernel instances).

@ccordoba12
Copy link
Contributor

This is a really great addition!

I just have one question: Is it possible to add a restart message too? If that's the case, then we (Spyder) wouldn't need to wait for kernel nanny either.

@filmor
Copy link
Contributor Author

filmor commented Oct 30, 2017

@ccordoba12 What would you expect the kernel to do when it gets this message?

@takluyver There doesn't seem to be any activity on the ML regarding this PR, maybe it helps if you just cc all relevant people here ...

@minrk
Copy link
Member

minrk commented Oct 30, 2017

I think this is a fine addition. We just need to add this to the docs and bump the minor-revision of the spec.

@ccordoba12
Copy link
Contributor

@filmor, kernels can be restarted after a shutdown:

https://github.com/jupyter/jupyter_client/blob/master/jupyter_client/manager.py#L293

This would be useful for Spyder because people could interact with external kernels the same way as they would do with kernels started by Spyder itself.

@minrk, what do you think?

@SpencerPark
Copy link
Contributor

@ccordoba12 The shutdown_request looks like it already implements what you are asking about. The contents of this message has a restart flag to check if the shutdown precedes a restart.

What additional information are you looking for on top of that?

@ccordoba12
Copy link
Contributor

Sorry, it seems you're right. I'll take a look at it more carefully.

@takluyver
Copy link
Member

Thanks @filmor . I think the remaining thing to do is to bump the protocol version to 5.3. The places I'm aware of this are in the messaging doc near the top, and in _version.py.

@filmor
Copy link
Contributor Author

filmor commented Nov 7, 2017

I have no idea why this test fails ...

- interrupt_mode="signal" is the default and current behaviour
- With interrupt_mode="message", instead of a signal, a
  `interrupt_request` message on the control port will be sent
@filmor
Copy link
Contributor Author

filmor commented Nov 13, 2017

And I can not reproduce it locally. Any hints?

@takluyver
Copy link
Member

It might be related to a change in zeromq, perhaps. @minrk this is the failure:

    def test_tracking(self):
        """test tracking messages"""
        a,b = self.create_bound_pair(zmq.PAIR, zmq.PAIR)
        s = self.session
        s.copy_threshold = 1
        stream = ZMQStream(a)
        msg = s.send(a, 'hello', track=False)
        self.assertTrue(msg['tracker'] is ss.DONE)
        msg = s.send(a, 'hello', track=True)
        self.assertTrue(isinstance(msg['tracker'], zmq.MessageTracker))
        M = zmq.Message(b'hi there', track=True)
        msg = s.send(a, 'hello', buffers=[M], track=True)
        t = msg['tracker']
        self.assertTrue(isinstance(t, zmq.MessageTracker))
>       self.assertRaises(zmq.NotDone, t.wait, .1)
E       AssertionError: NotDone not raised by wait

@takluyver
Copy link
Member

I think Min's changes in #304, which was just merged, probably fixed this. Closing and reopening to test.

@takluyver takluyver closed this Nov 13, 2017
@takluyver takluyver reopened this Nov 13, 2017
@filmor filmor changed the title [RFC] Additional to the actual signal, send a message on the control port Additional to the actual signal, send a message on the control port Nov 13, 2017
@filmor
Copy link
Contributor Author

filmor commented Nov 13, 2017

Yep, this looks better. Is there anything else to do?

docs/kernels.rst Outdated
- **interrupt_mode** (optional): May be either ``signal`` or ``message`` and
specifies how a client is supposed to interrupt cell execution on this kernel,
either by sending an interrupt ``signal`` via the operating system's
signalling facilities (e.g. `SIGTERM` on POSIX systems), or by sending an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGTERM -> SIGINT

@takluyver
Copy link
Member

Thanks!

@takluyver takluyver merged commit 0d7d00f into jupyter:master Nov 13, 2017
@takluyver takluyver added this to the 5.2 milestone Dec 15, 2017
@takluyver
Copy link
Member

@meeseeksdev backport

@lumberbot-app
Copy link

lumberbot-app bot commented Dec 15, 2017

There seem to be a conflict, please backport manually

takluyver added a commit that referenced this pull request Dec 15, 2017
…the control port

In Erlang I have only on POSIX and only in the newest version some control over the normal runtime signals, in particular SIGINT. However, I can easily pick up and process the respective message on the control socket to interrupt execution.

Can this functionality be added? If yes, I'd continue and document the functionality and add tests. Maybe it would also be good to have the option of not sending an actual signal (via `os.kill`) at all.

Signed-off-by: Thomas Kluyver <[email protected]>
@takluyver
Copy link
Member

Backported manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants