Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter users on the server if possible #22

Merged
merged 1 commit into from
Mar 19, 2021

Conversation

mriedem
Copy link
Contributor

@mriedem mriedem commented Mar 15, 2021

JupyterHub 1.3.0 supports server side filtering of the users with
servers in a given state [1]. This should perform better when culling
a large number of users since the filtering can happen in the database
rather than in the client side script on all users.

The idle-culler is currently just calling GET /users and filtering
out users with pending servers on the client side.

This change checks to see if the hub API is new enough for server
side state filtering and if so, filters the initial set of users
to only those who have any ready servers (running, not pending).
Furthermore, if --cull-users is used we make a second call to
GET /users?state=inactive to get all users who have no active
servers since they would have been filtered out of the initial
set (GET /users?state=ready).

If the jupyterhub API version is not new enough the behavior is
the same and we filter client-side.

Details on manual testing can be found in the issue comments.

[1] https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users-get

Closes #21

JupyterHub 1.3.0 supports server side filtering of the users with
servers in a given state [1]. This should perform better when culling
a large number of users since the filtering can happen in the database
rather than in the client side script on _all_ users.

The idle-culler is currently just calling `GET /users` and filtering
out users with pending servers on the client side.

This change checks to see if the hub API is new enough for server
side `state` filtering and if so, filters the initial set of users
to only those who have any ready servers (running, not pending).
Furthermore, if `--cull-users` is used we make a second call to
`GET /users?state=inactive` to get all users who have _no_ active
servers since they would have been filtered out of the initial
set (`GET /users?state=ready`).

If the jupyterhub API version is not new enough the behavior is
the same and we filter client-side.

Details on manual testing can be found in the issue comments.

[1] https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users-get

Closes jupyterhub#21
@welcome
Copy link

welcome bot commented Mar 15, 2021

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@mriedem
Copy link
Contributor Author

mriedem commented Mar 15, 2021

@yuvipanda feel free to make changes on this as necessary. It's probably not the prettiest but works in my local testing against jupyterhub 1.3.0.

@mriedem
Copy link
Contributor Author

mriedem commented Mar 15, 2021

@minrk FYI since you added the server side changes for filtering users by server state.

@yuvipanda yuvipanda requested a review from minrk March 15, 2021 22:00
@minrk minrk merged commit c1dc47a into jupyterhub:master Mar 19, 2021
@welcome
Copy link

welcome bot commented Mar 19, 2021

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@minrk
Copy link
Member

minrk commented Mar 19, 2021

Looks great!

@mriedem mriedem deleted the 21-state-filter branch March 19, 2021 13:49
@mriedem
Copy link
Contributor Author

mriedem commented Mar 19, 2021

An example of how this will help, in our testing cluster I've scaled up to 3K users and most of them have stopped servers (the notebook idle culler shut down the pods themselves). Restarting the hub this morning and hit a timeout when cull-idle ran (this is jupyterhub 1.2.2):

Mar 19 09:11:19 hub-6fb8bf4b44-pd2xm hub DEBUG DEBUG 2021-03-19T14:11:19.635Z [JupyterHub base:282] Recording first activity for <APIToken('58ff...', service='cull-idle')>
Mar 19 09:11:39 hub-6fb8bf4b44-pd2xm hub [E 210319 14:11:39 ioloop:761] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f996356b2b0>>, <Future finished exception=HTTP 599: Operation timed out after 20001 milliseconds with 0 bytes received>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 741, in _run_callback
        ret = callback()
      File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 765, in _discard_future_result
        future.result()
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 769, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py", line 120, in cull_idle
        resp = yield fetch(req)
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 762, in run
        value = future.result()
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 775, in run
        yielded = self.gen.send(value)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py", line 113, in fetch
        return (yield client.fetch(req))
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 762, in run
        value = future.result()
    tornado.curl_httpclient.CurlError: HTTP 599: Operation timed out after 20001 milliseconds with 0 bytes received
Mar 19 09:11:58 hub-6fb8bf4b44-pd2xm hub INFO INFO 2021-03-19T14:11:58.113Z [JupyterHub log:181] 200 GET /hub/api/users (cull-idle@::1) 38494.45ms 

So once we get to jupyterhub 1.3.0 (via z2jh 0.11.x) and newer jupyterhub-idle-culler that should be less of a problem.

@consideRatio consideRatio added the enhancement New feature or request label Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter users in the server if possible
3 participants