Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent two kernels to have the same ports #490

Merged
merged 1 commit into from
Oct 10, 2019
Merged

Prevent two kernels to have the same ports #490

merged 1 commit into from
Oct 10, 2019

Conversation

martinRenou
Copy link
Member

Prevent two kernels to have the same ports assigned in MultiKernelManager.

This is a workaround for #487. And it should prevent voila-dashboards/voila#408 most of the times.

This is only a workaround, and it will not prevent ZMQError 100% of the time. Because it's only an in-memory watchdog it cannot prevent another application to take those ports.

@martinRenou
Copy link
Member Author

@maartenbreddels
Copy link

Looks good!

PS: FYI, it does not solve the 'notebook server restart with open notebooks' issue, the kernels do not seem to restart property. Note that this PR was no attempt to solve this.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Only comment about not affecting the ipc case (which also should have no race condition issues), but otherwise +1.

jupyter_client/multikernelmanager.py Outdated Show resolved Hide resolved
jupyter_client/tests/test_multikernelmanager.py Outdated Show resolved Hide resolved
@martinRenou
Copy link
Member Author

Thanks for your review @minrk !

Prevent two kernels to have the same ports assigned in MultiKernelManager
@SylvainCorlay
Copy link
Member

This looks good to me!

@SylvainCorlay SylvainCorlay merged commit f6ceeaa into jupyter:master Oct 10, 2019
@martinRenou martinRenou deleted the workaround_port_issue branch October 10, 2019 13:02
@kevin-bates
Copy link
Member

Hold on! This PR was submitted 7 hours ago and merged after 6 hours!

At a minimum this needs to be configurable to not occur. In addition, kernel manager's don't necessarily know (nor should they) if the kernel they are managing is local or remote (where this "workaround" doesn't apply)! As a result, this configurability needs to be dynamic in the sense that the decision as to whether or not locate/register the ports should be performed on each instance. Ideally this code should move into the Kernel Manager or as the Kernel Manager whether it should be done at all. Same goes for the shutdown case.

@maartenbreddels
Copy link

Note that this PR does not implement the idea mentioned in #487, it only avoids the issue of using port numbers multiple times, this is morely a bugfix right?

@SylvainCorlay
Copy link
Member

This PR is merely a workaround and not implementing the new proposed handshake mechanism.

@kevin-bates
Copy link
Member

That's not the point. This breaks clients of jupyter_client and needs to provide a means for subclasses that don't want this behavior to not have it.

@martinRenou
Copy link
Member Author

This breaks clients of jupyter_client

How does it break it? I understand this does make sense in the case of a remote kernel, but it does not seem to me that the behavior changes much.

@kevin-bates
Copy link
Member

I am referring to remote kernels.

So if I'm reading this correctly, this will unconditionally create five local ports when the kernel manager instance is created and register those ports with the MultiKernelManager. Then, when the connection information is returned from the remote kernel, those port-based members will be set relative to the remote ports (performed directly or indirectly via the kernel manager instance).

On shutdown, none of the originally allocated ports will be removed (nor closed) because the port is relative to the remote kernel, and, again if I understand correctly, SO_LINGER doesn't timeout. In addition, if the remote port happens to coincide with some other kernel's actual port, that port will be removed from the managed set - although because it's still in use by the other kernel, probably not a big deal.

So I guess the only issue is that 5 local ports are leaked for every remote kernel instance until the number of remote kernels started exhausts the ports on the local server and/or the server hangs seemingly indefinitely because the infinite loop takes forever to locate "unused" ports.

I hope I have something wrong.

@martinRenou
Copy link
Member Author

Those 5 sockets are temporary, they are only used for finding an available port on the machine where jupyter_client is running. Those sockets are closed directly, see https://github.com/jupyter/jupyter_client/pull/490/files#diff-3251e04c5627ebe6b4cc54ddacb4bfa8R98.

This is the exact same behavior as before when jupyter_client creates 5 temporary sockets when creating the connection file: https://github.com/jupyter/jupyter_client/blob/master/jupyter_client/connect.py#L96-L106.

This behavior is definitely useless in the case when using a remote kernel. But it should not be a problem though. The only time it will be a problem is when the machine on which jupyter_client is running is out of available ports, but I guess it's unlikely to happen?

@kevin-bates
Copy link
Member

@martinRenou - thank you for the explanation. I guess I wasn't following the temporal nature of things.

I guess the issue would be that because the set is not reflective of actual in-use ports, the while loop may eventually not find available ports because of the false reporting that the port it did find is still listed in the set. Not sure how "random" the use of bind(..,0) is, but if it finds the same ports (because they are not physically in use) and those ports have been "leaked" in the set, then the loop will not terminate.

@martinRenou
Copy link
Member Author

martinRenou commented Oct 10, 2019

Not sure how "random" the use of bind(..,0) is

What I understand from the socket module documentation is that it is OS-dependent: If host or port are ‘’ or 0 respectively the OS default behavior will be used., see https://docs.python.org/3/library/socket.html#socket.create_connection. So I guess we cannot really make any conclusions here.

The probability to be stuck in an infinite loop is not 0, I agree. But I suppose it is unlikely to happen because it does not happen that much that the port you look for is already taken. It happens enough to be annoying (the kernel does not start), but not enough to get stuck infinitely in this loop.

@kevin-bates
Copy link
Member

Ok. Enterprise Gateway is long-running and multi-tenant - the likelyhood increases over the single-user server scenario.

If you could make this optional on a per-instance basis - that would be greatly appreciated.

@SylvainCorlay
Copy link
Member

I guess the issue would be that because the set is not reflective of actual in-use ports, the while loop may eventually not find available ports because of the false reporting that the port it did find is still listed in the set.

The clean solution to this is the handshake mechanism proposed by @JohanMabille in #487, which is also a common pattern for this problem. Since the clean solution would be a major change, we went with this workaroudn instead.

I am not worried about the infinite loop because this behavior was already in connect.py as mentioned by Martin.

Ok. Enterprise Gateway is long-running and multi-tenant - the likelyhood increases over the single-user server scenario.
If you could make this optional on a per-instance basis - that would be greatly appreciated.

In the contrary, if your usage scenario has more load and more likelihood of people stepping on each other toes, you probably want this fix even more. Note that the previous behavior causes ZMQ failures almost all the time when creating 5 kernels at once.

@kevin-bates
Copy link
Member

kevin-bates commented Oct 10, 2019

Thanks @SylvainCorlay. Because we already do what #487 is proposing we don't need this workaround. In fact, we have clients that run our "remote" model in loopback mode just to avoid the ZMQ failure issue.

@SylvainCorlay
Copy link
Member

@kevin-bates we will be adding a flag so that the new behavior can be disabled before the next release.

@kevin-bates
Copy link
Member

@SylvainCorlay - thank you.

It would be great if that flag is per-instance. For example, if the action to call self._find_available_port is a function of an attribute in the kernel manager (something other than transport). That way, overrides of KernelManager can influence this behavior depending on whether its a remote kernel or not, etc.

@SylvainCorlay
Copy link
Member

It would be great if that flag is per-instance

I presume a trait would be fine?

@kevin-bates
Copy link
Member

Yes - thank you.

@martinRenou
Copy link
Member Author

In terms of code design, it does not seem right to me to put a flag in the KernelManager class, because it's logic only relevant when using the MultiKernelManager.

@kevin-bates why do you want a per-instance flag? Why is one single flag in the MultiKernelManager not enough? Do you have use cases when you want to cache ports for some kernel managers but not for others?

@SylvainCorlay
Copy link
Member

In terms of code design, it does not seem right to me to put a flag in the KernelManager class, because it's logic only relevant when using the MultiKernelManager .

My understanding was that @kevin-bates wants the trait in MultiKernelManager, which makes sense.

@martinRenou
Copy link
Member Author

I opened a PR #492

@kevin-bates
Copy link
Member

I've submitted my responses via a review on #492 - thank you for including me on that.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-client-6-0-0/3414/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants