Add support for async kernel management via subclassing #428

kevin-bates · 2019-03-11T23:25:30Z

After a couple failed attempts due to API breakage and incompleteness as documented in PR #425, this PR takes an approach using subclassing that @minrk had proposed. As a result, I felt it best to start anew with the discussion and I will close #425, which will still be useful for background. With this approach existing applications can continue using the synchronous classes, while applications that desire async kernel management can do so by setting the kernel_manager_class traitlet to the appropriate asynchronous form. As a result, its a fairly clean separation.

AsyncKernelManager derives from KernelManager and uses async (gen.coroutine) methods where applicable. Similarly, AsyncMultiKernelManger from MultiKernelManager. AsyncIOLoopKernelManager derives from AsyncKernelManger instead of IOLoopKernelManager, otherwise it would not extend AsyncKernelManager. And AsyncIOLoopRestarter dervices from IOLoopRestarter since the base KernelRestarter is an independent class.

I chose to continue the use of @gen.coroutine as opposed to async def for the following reasons...

I'm not sure how it works to cross the two - which would need to happen once Notebook starts using the async classes.
The notebook-based subsystems haven't switched to async/await and felt this could be done at that time.
I'm still (stubbornly) hoping that we can backport this to 5.x for python 2 users. 😄

I'm marking this as a work-in-progress for now until a POC with Notebook and Enterprise Gateway can be completed.

EDIT: Add class hierarchy diagram (darker classes are new).

Carreau · 2019-03-14T13:31:49Z

Just subscribing to issue, Slowly catching up.

kevin-bates · 2019-03-16T00:22:09Z

POC with Notebook and Enterprise Gateway has been completed, so I've removed [WIP]. The test failures don't appear related and I can't exactly figure out what's going on. The tests pass in my env.

kevin-bates · 2019-03-25T18:47:31Z

@Carreau @takluyver @minrk - I'm hoping at least one of you can make a pass across these changes (and hopefully notebook #4479) sometime in the near future. It would be greatly appreciated.

Carreau · 2019-03-26T01:42:38Z

As this just adds new classes, I'm +1 ,I'd still love other to review. @SylvainCorlay ? Someon from your team ?

I also want to get the test to pass on travis.. they pass on my machine if if we can't figure that out, we can do a known_failure.

kevin-bates · 2019-03-26T02:17:58Z

Thank you Matthias. Yeah, the tests run locally for me as well. I'm not familiar enough with the test history to know if this might be something odd with Travis infrastructure or not, but am happy to look into things given some guidance.

Carreau · 2019-03-26T02:21:23Z

Yeah this seem to be only on slow systems. The timeout for teardown on PyZmq seem to be 2sec which is too slow for travis. I have the same error locally if I get the timeout down to 0.01s.

See zeromq/pyzmq#1275 as a suggested fix. We should be able to workaround this by patching the teardown method. I'll see if I can get a fix to this branch at some point.

Carreau · 2019-03-26T15:17:47Z

jupyter_client/tests/test_kernelmanager.py

+            return content
+
+        self.addCleanup(kc.stop_channels)
+        self.addCleanup(km.shutdown_kernel)


Haha ! These two are coroutines, My guess is that addCleanup does not wait for them, and this test, modify a global state which make the session_test fails (there was also a timeout issue).

I'm looking at tornado AsyncTestCase but there does not seem to be an addCleanup for async function. I'll see if we can do something smart, or just to a except with a re-raise.

At least i think this is it.

Thanks for the insight @Carreau. I agree that this might be what's going on (although only km.shutdown_kernel() is a coroutine). Is there a reason that these need to be called as part of the tear down and not just part of the test? (That said, I ran into issues in the signal test calling shutdown kernel.)

@Carreau - You're right about the culprit being addCleanup(). I went ahead and removed its use for the async tests - replacing with direct calls to shutdown_kernel() and stop_channels() within finally blocks.

Carreau · 2019-03-26T15:43:30Z

jupyter_client/tests/test_kernelmanager.py

+        N = 5
+        for i in range(N):
+            execute("start")
+        time.sleep(1)  # make sure subprocs stay up


Sleep seem to also make the test break... not sure why. Tried to replace by yield asyncio.sleep() to no avail.

kevin-bates · 2019-04-16T15:40:03Z

@Carreau - ping.

MSeal · 2019-05-13T23:44:07Z

@Carreau I'm thinking of having a patch release soon for jupyter_client to pick up the parallel zeromq context fix. This looks a bit meaty of a PR. Should we try to push to get it in and do a minor release or move the other change independently out?

rgbkrk · 2019-05-28T12:46:54Z

Hey I'll start checking this out too. I started to do this by hand for fun yesterday during the kernel workshop then thought maybe there were some good bits in jupyter_client. I'll spend time reviewing here instead of adding my code.

rgbkrk · 2019-05-28T12:55:55Z

This leaves room open later for using zmq.asyncio's Context to set up async/await-able sockets, so it looks like my work was not for naught. 😄

Quite a bit happened while I was on leave, it was good to read the discussion in #425. Thanks for making these PRs @kevin-bates and @minrk + @Carreau for creating plans for each coming major release while beginning async work. ❤️

meeseeksmachine · 2019-08-29T16:09:15Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/scalable-enterprise-gateway/2014/2

SylvainCorlay · 2019-08-30T09:05:46Z

@maartenbreddels @jtpio

MSeal · 2020-02-22T00:22:13Z

@kevin-bates @davidbrochart How does this mesh with #506 that got merged? I'm about to do a 6.0 release this weekend with that code which was merged to master with a python 3 only constraint (we're doing that for every jupyter project atm). I wasn't super involved in that PR or this one, though it seems like we could move forward with current master and improve abstractions taken from here in a 6.1 release later if it makes sense.

kevin-bates · 2020-02-22T00:41:04Z

I think there would be good synergy between the two. This PR is purely targeting the kernel's lifecycle management - addressing the issue of blocking kernel starts, while #506, if I understand correctly, is more about making the communication with the kernel asynchronous via kernel-client. Is that your understanding as well @davidbrochart?

I think using a minor release to introduce this change is probably the right move from the standpoint of not introducing too many changes at once. What is good about this PR is that it simply exposes a sibling kernel manager class that has async support. This way applications can decide when to switch and its merge won't (well, shouldn't) affect the status quo.

SylvainCorlay · 2020-02-24T15:52:00Z

Thanks @kevin-bates. Please ping me when you update it. We are definitely interested in this.

Introduced async subclasses that derive from synchronous classes, overriding appropriate methods with async support.

kevin-bates · 2020-02-28T22:55:24Z

@SylvainCorlay @davidbrochart - I've made the following changes...

Converted @gen.coroutine/yield to async/await (which wasn't as easy as it sounds. 😄)
Updated the async tests to use David's Async Client where applicable.
I found I needed to update some of the async threading tests to create an event loop, otherwise, they side-affected some of the downstream tests.
There were duplicated test names of 'test_start_sequence_tcp_kernels' where the second meant to be named with 'test_start_sequence_ipc_kernels' so that change was made.

Thank you for your review.

jupyter_client/ioloop/manager.py

jupyter_client/ioloop/restarter.py

jupyter_client/manager.py

davidbrochart · 2020-03-01T14:44:59Z

jupyter_client/manager.py

+
+            # Wait until the kernel terminates.
+            try:
+                await asyncio.wait_for(gen.maybe_future(self.kernel.wait()), timeout=5.0)


self.kernel.wait() is a blocking function (it waits for the process to finish) so gen.maybe_future won't make it async. It means the async nature of this function stack is broken. I think we should have a async_wait function that would poll the status of the process periodically, and this line would be e.g.:

await asyncio.wait_for(await self.kernel_async_wait(), timeout=5.0)

What do you think?

Yes - good point. I think kernel_async_wait() would look something like this: https://github.com/takluyver/jupyter_kernel_mgmt/blob/master/jupyter_kernel_mgmt/subproc/manager.py#L56-L66

Does that look right?

That sure does look a lot like finish_shutdown() so I'll look into refactoring these areas a bit.

Yes - good point. I think kernel_async_wait() would look something like this: https://github.com/takluyver/jupyter_kernel_mgmt/blob/master/jupyter_kernel_mgmt/subproc/manager.py#L56-L66

Does that look right?

Indeed. They even use asyncio.subprocess.Process, which would be ideal, but I'm not sure we can have it in jupyter_client.

Right - the hierarchy of things gets in the way for that.

Btw, I'm finding I need to indicate that the _async_wait() function is done...

async def _async_wait(self, pollinterval=0.1): # Use busy loop at 100ms intervals, polling until the process is # not alive. If we find the process is no longer alive, complete # its cleanup via the blocking wait(). Callers are responsible for # issuing calls to wait() using a timeout (see _kill_kernel()). while self.is_alive(): await asyncio.sleep(pollinterval) done = asyncio.Future() done.set_result(None) return done

otherwise I get failures stemming from its introduction since asyncio.wait_for() requires a future. This seems correct to me, but I honestly lack experience in this area. Is there a better way to do this?

asyncio.wait_for accepts an awaitable - a coroutine, a Task or a Future. _async_wait doesn't have to return a result:

async def _async_wait(self, pollinterval=0.1): while self.is_alive(): await asyncio.sleep(pollinterval)

But you need to catch the timeout:

try: await asyncio.wait_for(self._async_wait(), timeout=5.0) except asyncio.TimeoutError: # Wait timed out, just log warning but continue - not much more we can do. self.log.warning("Wait for final termination of kernel timed out - continuing...") pass

That is what your code was doing already. I realize my comment above was wrong, there is no await in the wait_for call.

jupyter_client/manager.py

Also added the cache_ports logic that existed in the sync multiKernelManager.

davidbrochart · 2020-03-03T08:24:51Z

jupyter_client/ioloop/restarter.py

@@ -9,17 +9,20 @@

 import warnings

+from tornado import gen


davidbrochart · 2020-03-03T08:39:14Z

jupyter_client/manager.py

+    async def _launch_kernel(self, kernel_cmd, **kw):
+        """actually launch the kernel
+
+        override in a subclass to launch kernel subprocesses differently
+        """
+        res = launch_kernel(kernel_cmd, **kw)
+        return res


There is nothing async about this function, it doesn't await anything.

Thanks for the review David, but I don't agree with this comment (and the others below) regarding the marking of a method with async that doesn't await. While it's true that this implementation doesn't await. Subclasses, especially those managing remote kernels, certainly will, since the times to start, interrupt, signal, etc. take an order of magnitude longer (perhaps more) to complete.

Because callers won't know they're talking to a kernel manager that is dealing with remote kernels (nor should they), I believe the async markers need to remain so the caller is always performing await start_kernel(), for example.

Similarly, I think is_alive() needs to be added back into AsyncKernelManager since poll() is an RPC in some remote kernels (and a REST call in others). And, by extension, I believe AsyncKernelClient needs an async is_alive() method since it will call on the kernel manager's is_alive() and we'll need to check the class to conditionally apply the await. I have such an incantation working right now, but I wanted to get your opinion before its push.

You convinced me @kevin-bates! Thanks for putting this work into perspective, I had not thought of the remote kernel use case. It makes more sense now, so please discard my previous comments.

davidbrochart · 2020-03-03T08:40:45Z

jupyter_client/manager.py

+    async def start_kernel(self, **kw):
+        """Starts a kernel in a separate process in an asynchronous manner.
+
+        If random ports (port=0) are being used, this method must be called
+        before the channels are created.
+
+        Parameters
+        ----------
+        `**kw` : optional
+             keyword arguments that are passed down to build the kernel_cmd
+             and launching the kernel (e.g. Popen kwargs).
+        """
+        kernel_cmd, kw = self.pre_start_kernel(**kw)
+
+        # launch the kernel subprocess
+        self.log.debug("Starting kernel (async): %s", kernel_cmd)
+        self.kernel = await self._launch_kernel(kernel_cmd, **kw)
+        self.post_start_kernel(**kw)


Since _launch_kernel should not be a coroutine, neither should start_kernel.

davidbrochart · 2020-03-03T08:47:12Z

jupyter_client/manager.py

+    async def interrupt_kernel(self):
+        """Interrupts the kernel by sending it a signal.
+
+        Unlike ``signal_kernel``, this operation is well supported on all
+        platforms.
+        """
+        if self.has_kernel:
+            interrupt_mode = self.kernel_spec.interrupt_mode
+            if interrupt_mode == 'signal':
+                if sys.platform == 'win32':
+                    from .win_interrupt import send_interrupt
+                    send_interrupt(self.kernel.win32_interrupt_event)
+                else:
+                    await self.signal_kernel(signal.SIGINT)
+
+            elif interrupt_mode == 'message':
+                msg = self.session.msg("interrupt_request", content={})
+                self._connect_control_socket()
+                self.session.send(self._control_socket, msg)
+        else:
+            raise RuntimeError("Cannot interrupt kernel. No kernel is running!")
+
+    async def signal_kernel(self, signum):
+        """Sends a signal to the process group of the kernel (this
+        usually includes the kernel and any subprocesses spawned by
+        the kernel).
+
+        Note that since only SIGTERM is supported on Windows, this function is
+        only useful on Unix systems.
+        """
+        if self.has_kernel:
+            if hasattr(os, "getpgid") and hasattr(os, "killpg"):
+                try:
+                    pgid = os.getpgid(self.kernel.pid)
+                    os.killpg(pgid, signum)
+                    return
+                except OSError:
+                    pass
+            self.kernel.send_signal(signum)
+        else:
+            raise RuntimeError("Cannot signal kernel. No kernel is running!")


interrupt_kernel and signal_kernel should not be coroutines, they are not doing anything blocking.

davidbrochart · 2020-03-03T08:48:18Z

jupyter_client/manager.py

+            # TerminateProcess() on Win32).
+            try:
+                if hasattr(signal, 'SIGKILL'):
+                    await self.signal_kernel(signal.SIGKILL)


No need to await here, signal_kernel should not be a coroutine.

davidbrochart · 2020-03-03T17:40:33Z

Just a general question, any reason we have an ioloop directory or should it be under asynchronous?

kevin-bates · 2020-03-03T18:03:13Z

I guess I don't understand your question. ioloop contains class definitions for both sync and async classes, just like manager.py and multikernelmanager.py. Are you suggesting we move all of these async classes into the asynchronous sub-directory?

davidbrochart · 2020-03-03T21:08:08Z

You're right, it doesn't make a lot of sense.
I think this is ready to be merged. Thanks a lot for all the effort you put into this PR!
Comments anybody?

kevin-bates · 2020-03-03T22:00:17Z

@davidbrochart - thank you for your review - it's much appreciated. I'm really stoked that this has been revived (since there are two other layers above this with dormant async management PRs 😄 ).

davidbrochart · 2020-03-04T08:22:41Z

It may be out of this PR's scope, but how would it play with Jupyter server's kernel manager? Will we need to write a MappingAsyncKernelManager, or just rewrite https://github.com/jupyter/jupyter_server/blob/master/jupyter_server/services/kernels/kernelmanager.py so that MappingKernelManager inherits from AsyncMultiKernelManager?

kevin-bates · 2020-03-04T15:37:25Z

The plan is for juptyer_server to adopt the new kernel management framework via this PR. This transition is targetting Server's 0.4 release. Since kernel management in JKM is only async and the library no longer provides a MultiKernelManager, Server's MappingKernelManager will be straight async. What also needs to happen is applying your client async changes to JKM (please!).

However, the dual-mode behavior is exactly what I did for Notebook and I'm hoping once jupyter_client has a carrier for this PR, then Notebook PR 4479 can be similarly revived. That will then unblock the Enterprise Gateway PR, which will set all the right classes to make the full stack async and something gateway users desperately need!

davidbrochart · 2020-03-04T15:43:30Z

Thanks for the feedback, I have a lot to catch up!

kevin-bates · 2020-03-04T16:00:31Z

No worries. If you're interested in the server efforts, there's a weekly dev meeting on Thursdays: https://jupyter.readthedocs.io/en/latest/community/content-community.html#monthly-meetings

davidbrochart · 2020-03-04T22:30:50Z

@kevin-bates I'm not sure I did the right thing, the commit history shows I am the author of this commit, I don't see you name. Is it because I squashed it?

SylvainCorlay · 2020-03-04T22:48:52Z

The plan is for juptyer_server to adopt the new kernel management framework via this PR. This transition is targetting Server's 0.4 release. Since kernel management in JKM is only async and the library no longer provides a MultiKernelManager, Server's MappingKernelManager will be straight async. What also needs to happen is applying your client async changes to JKM (please!).

@kevin-bates I think we need to have a plan for jupyter-server before the move to jupyter_kernel_management goes in. Moving jupyterlab and notebook to server, and also dropping jupyter_client all at once seems very disruptive to me, and should probably be done in two stages.

We have no feedback yet on jupyter_kernel_management, and I would like us to be cautious.

kevin-bates · 2020-03-04T23:18:26Z

@davidbrochart - Regarding your comment: #428 (comment). No worries - this just means you "touched it last"! 😄

Just kidding. Normally I wouldn't care too much, but this particular effort was significant and has been more than a year coming, so I think this should be corrected. I don't know how this happened, but I would think that commit could be amended with something like this:

git commit --amend --author="Kevin Bates <[email protected]>"

cc: @MSeal, @SylvainCorlay for their git fu.

SylvainCorlay · 2020-03-05T02:26:37Z

@kevin-bates no worries I will fix the git history first thing tomorrow to include your commits.

MSeal · 2020-03-05T02:55:18Z

Thanks for tackling this PR against everyone. It's nice seeing folks helping get these larger changes merged.

If any additional git fu is needed let me know, I can probably help too.

SylvainCorlay · 2020-03-05T14:56:42Z

@kevin-bates you should now be credited for your commits in master!

kevin-bates mentioned this pull request Mar 11, 2019

[WIP] Changes necessary to get async kernel startup working #425

Closed

This was referenced Mar 14, 2019

Add support for async kernel management jupyter/notebook#4479

Merged

[JUPYTER DEPENDENCY][WIP] Add support for async kernel management jupyter-server/enterprise_gateway#580

Closed

kevin-bates changed the title ~~[WIP] Add support for async kernel management via subclassing~~ Add support for async kernel management via subclassing Mar 16, 2019

Carreau reviewed Mar 26, 2019

View reviewed changes

kevin-bates force-pushed the async-subclasses branch from 9817e71 to 53a8d06 Compare April 2, 2019 15:29

kevin-bates mentioned this pull request Apr 25, 2019

Timeout during request in several conditions jupyter/nb2kg#34

Closed

kevin-bates mentioned this pull request Jun 7, 2019

Sync Notebook and Jupyter Server. jupyter-server/jupyter_server#53

Closed

48 tasks

kevin-bates mentioned this pull request Aug 9, 2019

Only support async methods for finding and launching kernels takluyver/jupyter_kernel_mgmt#18

Closed

maartenbreddels mentioned this pull request Aug 30, 2019

feat: add client class that uses async/await to integrate with asyncio #471

Closed

jasongrout mentioned this pull request Sep 25, 2019

Weekly Dev Call Meeting Minutes 2019-09-25 jupyterlab/frontends-team-compass#18

Closed

SylvainCorlay force-pushed the master branch from b1fe8e4 to 3e8ee4a Compare February 21, 2020 18:38

kevin-bates mentioned this pull request Feb 21, 2020

Allow multi kernel manager to handle asynchronous start kernel #402

Closed

kevin-bates force-pushed the async-subclasses branch 2 times, most recently from 8433088 to 7797238 Compare February 22, 2020 00:04

kevin-bates added 2 commits February 28, 2020 14:42

Add support for async kernel management via subclassing

6c67f53

Introduced async subclasses that derive from synchronous classes, overriding appropriate methods with async support.

Convert to async/await model

f6f8696

kevin-bates force-pushed the async-subclasses branch from ff60c50 to f6f8696 Compare February 28, 2020 22:47

davidbrochart reviewed Mar 1, 2020

View reviewed changes

Apply changes per review

86b8e72

Also added the cache_ports logic that existed in the sync multiKernelManager.

davidbrochart reviewed Mar 3, 2020

View reviewed changes

Add async def is_alive() back

6beb0f9

davidbrochart merged commit 99bb826 into jupyter:master Mar 4, 2020

SylvainCorlay mentioned this pull request Mar 5, 2020

Async subclasses #528

Merged

kevin-bates mentioned this pull request Mar 10, 2020

Expose control channel on AsyncKernelManager #529

Merged

davidbrochart mentioned this pull request Mar 13, 2020

[WIP] Remove async send_signal method, defer to subclass #530

Closed

This was referenced Mar 16, 2020

Use jupyter_client's AsyncKernelManager jupyter-server/jupyter_server#191

Merged

Add support for async kernel management jupyter-server/enterprise_gateway#794

Merged

kevin-bates deleted the async-subclasses branch December 20, 2020 18:18

Add support for async kernel management via subclassing #428

Add support for async kernel management via subclassing #428

Conversation

kevin-bates commented Mar 11, 2019 • edited Loading

Carreau commented Mar 14, 2019

kevin-bates commented Mar 16, 2019

kevin-bates commented Mar 25, 2019

Carreau commented Mar 26, 2019

kevin-bates commented Mar 26, 2019

Carreau commented Mar 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin-bates commented Apr 16, 2019

MSeal commented May 13, 2019

rgbkrk commented May 28, 2019

rgbkrk commented May 28, 2019

meeseeksmachine commented Aug 29, 2019

SylvainCorlay commented Aug 30, 2019

MSeal commented Feb 22, 2020

kevin-bates commented Feb 22, 2020

SylvainCorlay commented Feb 24, 2020

kevin-bates commented Feb 28, 2020

davidbrochart Mar 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin-bates Mar 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidbrochart commented Mar 3, 2020

kevin-bates commented Mar 3, 2020

davidbrochart commented Mar 3, 2020

kevin-bates commented Mar 3, 2020

davidbrochart commented Mar 4, 2020

kevin-bates commented Mar 4, 2020

davidbrochart commented Mar 4, 2020

kevin-bates commented Mar 4, 2020

davidbrochart commented Mar 4, 2020

SylvainCorlay commented Mar 4, 2020

kevin-bates commented Mar 4, 2020

SylvainCorlay commented Mar 5, 2020

MSeal commented Mar 5, 2020

SylvainCorlay commented Mar 5, 2020

kevin-bates commented Mar 11, 2019 •

edited

Loading

davidbrochart Mar 1, 2020 •

edited

Loading

kevin-bates Mar 2, 2020 •

edited

Loading