Filter users on the server if possible #22

mriedem · 2021-03-15T20:25:24Z

JupyterHub 1.3.0 supports server side filtering of the users with
servers in a given state [1]. This should perform better when culling
a large number of users since the filtering can happen in the database
rather than in the client side script on all users.

The idle-culler is currently just calling GET /users and filtering
out users with pending servers on the client side.

This change checks to see if the hub API is new enough for server
side state filtering and if so, filters the initial set of users
to only those who have any ready servers (running, not pending).
Furthermore, if --cull-users is used we make a second call to
GET /users?state=inactive to get all users who have no active
servers since they would have been filtered out of the initial
set (GET /users?state=ready).

If the jupyterhub API version is not new enough the behavior is
the same and we filter client-side.

Details on manual testing can be found in the issue comments.

[1] https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users-get

Closes #21

JupyterHub 1.3.0 supports server side filtering of the users with servers in a given state [1]. This should perform better when culling a large number of users since the filtering can happen in the database rather than in the client side script on _all_ users. The idle-culler is currently just calling `GET /users` and filtering out users with pending servers on the client side. This change checks to see if the hub API is new enough for server side `state` filtering and if so, filters the initial set of users to only those who have any ready servers (running, not pending). Furthermore, if `--cull-users` is used we make a second call to `GET /users?state=inactive` to get all users who have _no_ active servers since they would have been filtered out of the initial set (`GET /users?state=ready`). If the jupyterhub API version is not new enough the behavior is the same and we filter client-side. Details on manual testing can be found in the issue comments. [1] https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users-get Closes jupyterhub#21

welcome · 2021-03-15T20:25:26Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.

You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

mriedem · 2021-03-15T20:26:16Z

@yuvipanda feel free to make changes on this as necessary. It's probably not the prettiest but works in my local testing against jupyterhub 1.3.0.

mriedem · 2021-03-15T20:29:26Z

@minrk FYI since you added the server side changes for filtering users by server state.

welcome · 2021-03-19T08:28:19Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

minrk · 2021-03-19T08:28:20Z

Looks great!

mriedem · 2021-03-19T14:21:23Z

An example of how this will help, in our testing cluster I've scaled up to 3K users and most of them have stopped servers (the notebook idle culler shut down the pods themselves). Restarting the hub this morning and hit a timeout when cull-idle ran (this is jupyterhub 1.2.2):

Mar 19 09:11:19 hub-6fb8bf4b44-pd2xm hub DEBUG DEBUG 2021-03-19T14:11:19.635Z [JupyterHub base:282] Recording first activity for <APIToken('58ff...', service='cull-idle')>
Mar 19 09:11:39 hub-6fb8bf4b44-pd2xm hub [E 210319 14:11:39 ioloop:761] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f996356b2b0>>, <Future finished exception=HTTP 599: Operation timed out after 20001 milliseconds with 0 bytes received>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 741, in _run_callback
        ret = callback()
      File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 765, in _discard_future_result
        future.result()
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 769, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py", line 120, in cull_idle
        resp = yield fetch(req)
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 762, in run
        value = future.result()
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 775, in run
        yielded = self.gen.send(value)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py", line 113, in fetch
        return (yield client.fetch(req))
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 762, in run
        value = future.result()
    tornado.curl_httpclient.CurlError: HTTP 599: Operation timed out after 20001 milliseconds with 0 bytes received
Mar 19 09:11:58 hub-6fb8bf4b44-pd2xm hub INFO INFO 2021-03-19T14:11:58.113Z [JupyterHub log:181] 200 GET /hub/api/users (cull-idle@::1) 38494.45ms

So once we get to jupyterhub 1.3.0 (via z2jh 0.11.x) and newer jupyterhub-idle-culler that should be less of a problem.

yuvipanda requested a review from minrk March 15, 2021 22:00

minrk merged commit c1dc47a into jupyterhub:master Mar 19, 2021

mriedem deleted the 21-state-filter branch March 19, 2021 13:49

consideRatio added the enhancement New feature or request label Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter users on the server if possible #22

Filter users on the server if possible #22

mriedem commented Mar 15, 2021

welcome bot commented Mar 15, 2021

mriedem commented Mar 15, 2021

mriedem commented Mar 15, 2021

welcome bot commented Mar 19, 2021

minrk commented Mar 19, 2021

mriedem commented Mar 19, 2021

Filter users on the server if possible #22

Filter users on the server if possible #22

Conversation

mriedem commented Mar 15, 2021

welcome bot commented Mar 15, 2021

mriedem commented Mar 15, 2021

mriedem commented Mar 15, 2021

welcome bot commented Mar 19, 2021

minrk commented Mar 19, 2021

mriedem commented Mar 19, 2021