Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support JupyterHub 2.0 API pagination #30

Merged
merged 3 commits into from
Sep 17, 2021

Conversation

minrk
Copy link
Member

@minrk minrk commented Aug 24, 2021

Pagination is coming in JupyterHub 2.0, implemented behind an Accept header in jupyterhub/jupyterhub#3535

  • adds fetch_paginated async generator to yield items from a paginated endpoint

  • adopt async/await syntax, requiring Python 3.6 for async generator syntax (Python 3.6 was already the baseline in CI, but not specified in package metadata)

  • cull inactive users before culling ready servers. This should avoid an unlikely async race where both paths may try to delete the same user, but that I triggered running with very small thresholds to test

adopt async/await syntax, require Python 3.6 for async generator syntax

cull inactive users *before* culling ready servers

avoids async race where both paths may try to delete the same user
@welcome

This comment has been minimized.

@consideRatio
Copy link
Member

This looks great but think I lack clarity of understanding still of this code because I think the async generators and their use make it quite complicated.

I wonder:

  1. Is there a rate limit mechanism in jupyterhub-idle-culler that stops us from culling more than X users at the time? I recall there to be that.
  2. If so, will jupyterhub-idle-culler paginate every single user even though it won't cull more users than in the first few responses because of it being rate limited?

I can try follow up and answer these questions myself later, sorry for being lazy not doing the investigation right now myself.

@minrk
Copy link
Member Author

minrk commented Sep 17, 2021

I think the async generators and their use make it quite complicated.

I'd like to work on this, because I thought it would make it simpler: just like we had one function that produced an iterable of users (a list), we now have one function that produces an iterable of users (a generator that happens to be async). So async for user in list_users() is functionally identical to for user in await list_users(), only the number of http requests it happens to take to generate the list of users may differ, and precisely when each request is made, which shouldn't be significant.

The code to iterate through users is unchanged, and unaffected by the number of requests it takes to produce the list of users.

Is there a rate limit mechanism in jupyterhub-idle-culler that stops us from culling more than X users at the time?

Not exactly. There is a semaphore limiting the number of outstanding HTTP requests. This has the approximate effect of limiting concurrent shutdowns, but is not quite the same (if shutdowns take a long time, they are left pending, so can be concurrent with other shutdowns even while an http request is not outstanding, plus requests other than shutdown consume the same concurrency pool)

If so, will jupyterhub-idle-culler paginate every single user even though it won't cull more users than in the first few responses because of it being rate limited?

I don't quite understand what paginating a single user would mean. The only pagination occurs in the request to list users - this may take one request, or it may take 5 (or 50), but once we are processing users, it shouldn't be related

The sequence should look like this:

  1. request list of users (gets one page)
  2. for each user in batch, prepare to process the user (delete requests, etc.)
  3. when reaching the end of the batch, request the next page of users (occurs inside the async generator, prior to the next yield)
  4. repeat until we have processed every user

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM!

I'd like to see a comment about the complexity of users being deleted/added influences the returned pagination results - mainly as an inline note in the code, perhaps also in the README.md if relevant.

@consideRatio consideRatio merged commit 6a1d978 into jupyterhub:master Sep 17, 2021
@welcome
Copy link

welcome bot commented Sep 17, 2021

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@consideRatio consideRatio added the enhancement New feature or request label Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants