support JupyterHub 2.0 API pagination #30

minrk · 2021-08-24T13:05:50Z

Pagination is coming in JupyterHub 2.0, implemented behind an Accept header in jupyterhub/jupyterhub#3535

adds fetch_paginated async generator to yield items from a paginated endpoint
adopt async/await syntax, requiring Python 3.6 for async generator syntax (Python 3.6 was already the baseline in CI, but not specified in package metadata)
cull inactive users before culling ready servers. This should avoid an unlikely async race where both paths may try to delete the same user, but that I triggered running with very small thresholds to test

adopt async/await syntax, require Python 3.6 for async generator syntax cull inactive users *before* culling ready servers avoids async race where both paths may try to delete the same user

consideRatio · 2021-08-26T13:38:12Z

This looks great but think I lack clarity of understanding still of this code because I think the async generators and their use make it quite complicated.

I wonder:

Is there a rate limit mechanism in jupyterhub-idle-culler that stops us from culling more than X users at the time? I recall there to be that.
If so, will jupyterhub-idle-culler paginate every single user even though it won't cull more users than in the first few responses because of it being rate limited?

I can try follow up and answer these questions myself later, sorry for being lazy not doing the investigation right now myself.

minrk · 2021-09-17T10:58:15Z

I think the async generators and their use make it quite complicated.

I'd like to work on this, because I thought it would make it simpler: just like we had one function that produced an iterable of users (a list), we now have one function that produces an iterable of users (a generator that happens to be async). So async for user in list_users() is functionally identical to for user in await list_users(), only the number of http requests it happens to take to generate the list of users may differ, and precisely when each request is made, which shouldn't be significant.

The code to iterate through users is unchanged, and unaffected by the number of requests it takes to produce the list of users.

Is there a rate limit mechanism in jupyterhub-idle-culler that stops us from culling more than X users at the time?

Not exactly. There is a semaphore limiting the number of outstanding HTTP requests. This has the approximate effect of limiting concurrent shutdowns, but is not quite the same (if shutdowns take a long time, they are left pending, so can be concurrent with other shutdowns even while an http request is not outstanding, plus requests other than shutdown consume the same concurrency pool)

If so, will jupyterhub-idle-culler paginate every single user even though it won't cull more users than in the first few responses because of it being rate limited?

I don't quite understand what paginating a single user would mean. The only pagination occurs in the request to list users - this may take one request, or it may take 5 (or 50), but once we are processing users, it shouldn't be related

The sequence should look like this:

request list of users (gets one page)
for each user in batch, prepare to process the user (delete requests, etc.)
when reaching the end of the batch, request the next page of users (occurs inside the async generator, prior to the next yield)
repeat until we have processed every user

consideRatio

This LGTM!

I'd like to see a comment about the complexity of users being deleted/added influences the returned pagination results - mainly as an inline note in the code, perhaps also in the README.md if relevant.

welcome · 2021-09-17T12:40:29Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

support JupyterHub 2.0 API pagination

04ce931

adopt async/await syntax, require Python 3.6 for async generator syntax cull inactive users *before* culling ready servers avoids async race where both paths may try to delete the same user

This comment has been minimized.

Sign in to view

This was referenced Aug 24, 2021

add opt-in model for paginated list results jupyterhub/jupyterhub#3535

Merged

The idle culler is not working correctly berkeley-dsep-infra/datahub#2452

Closed

consideRatio mentioned this pull request Sep 17, 2021

Release 2.0 jupyterhub/zero-to-jupyterhub-k8s#2386

Closed

15 tasks

consideRatio reviewed Sep 17, 2021

View reviewed changes

yuvipanda mentioned this pull request Sep 17, 2021

Investigate data100 503 errors berkeley-dsep-infra/datahub#2677

Closed

minrk added 2 commits September 17, 2021 14:33

expose api-page-size option

a221b07

describe possible issues with pagination.

c32e3f3

consideRatio merged commit 6a1d978 into jupyterhub:master Sep 17, 2021

consideRatio added the enhancement New feature or request label Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support JupyterHub 2.0 API pagination #30

support JupyterHub 2.0 API pagination #30

minrk commented Aug 24, 2021

This comment has been minimized.

consideRatio commented Aug 26, 2021

minrk commented Sep 17, 2021

consideRatio left a comment

welcome bot commented Sep 17, 2021

support JupyterHub 2.0 API pagination #30

support JupyterHub 2.0 API pagination #30

Conversation

minrk commented Aug 24, 2021

This comment has been minimized.

consideRatio commented Aug 26, 2021

minrk commented Sep 17, 2021

consideRatio left a comment

Choose a reason for hiding this comment

welcome bot commented Sep 17, 2021