Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core) implement client request rate limiting #683

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ceache
Copy link
Contributor

@ceache ceache commented Nov 12, 2022

Fixes #664,

Why is this needed?

This provides a top level control to limit the load a given client can generate on an ensemble in the form of number of requests.
In situation with large number of watches, the number of request emitted by the client is otherwise unbounded.

Proposed Changes

  • Introduce a "semaphore_impl" attribute on the various handlers.
  • Allow a new max_async_requests argument to the client constructor.
  • Change the client to bound the number of outstanding async requests with a semaphore limited to max_async_requests.

Does this PR introduce any breaking change?

No, it is backward compatible with a default of no limit.

@ceache
Copy link
Contributor Author

ceache commented Nov 12, 2022

@python-zk/maintainers This is a first draft. Let me know if you like the direction and I'll add tests, etc.

Full disclosure, we have had a version of this patch in production for a while (with both gevent and threading handlers). I have not tested the eventlet handler directly.

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (55f27b2) 96.76% compared to head (df7c611) 96.75%.
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
- Coverage   96.76%   96.75%   -0.02%     
==========================================
  Files          27       27              
  Lines        3557     3570      +13     
==========================================
+ Hits         3442     3454      +12     
- Misses        115      116       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@StephenSorriaux
Copy link
Member

I think it is a great idea. Just wondering: from a user experience perspective, will there be any way to know the rate limit has been hit?

@ceache
Copy link
Contributor Author

ceache commented Nov 13, 2022 via email

@StephenSorriaux
Copy link
Member

Yes, that makes a lot of sense, thank you.

I agree with you, rate limit is a feature, but sometimes it can be hard to "understand"/be aware of it if there is no logs telling you "hey, rate limit is currently being triggered" or "hey, rate limit has been triggered N times in the past M seconds" (even if, I agree, there is already a info log at client startup). You're right this should not be intrusive, I know some projects display a "rate limit has been triggered" message once in a while (every 30s or so for instance), but I believe a debug message can work too if you think it is better/useful.

Writing this message, it makes me realize that this lib currently does not provide any metrics (if it would, a number of times rate limit has been triggered would be useful). Maybe it is something we could add to the backlog, but that is not the point of your current proposition that, again, I think is great.

@ceache
Copy link
Contributor Author

ceache commented Nov 16, 2022 via email

kazoo/client.py Outdated Show resolved Hide resolved
kazoo/client.py Outdated Show resolved Hide resolved
kazoo/client.py Show resolved Hide resolved
kazoo/client.py Outdated Show resolved Hide resolved
@StephenSorriaux
Copy link
Member

I'll add a log message for now, and move to add some tests. About metrics, i was talking to opentelemetry folks at KubeCon about exactly that. it looks like it would be possible to use such a framework. I mean, they instrument SQLite3 and requests modules for example IMHO, of we can pull this off, this would be an ideal way to extract this kind of information, as well as request counts, error counts, connection length and a million other things i have always wanted to know about my zookeeper client but never knew how to ask stuck_out_tongue_winking_eye On a more serious note, if you like the idea, i can create an issue for this and try to gather more Intel?

Sorry for my late reply @ceache, busy weeks :(

To be honest, I really like the idea and it would be great if you can get some intel about it, especially on how we should make those metrics available to our users: should we provide some interface so that they can plug whatever client they want? Should we actually provide some integration ourselves (like prometheus, etc.)? I totally feel you on this subject, I also love to get tons of metrics about what's in production!

@ceache ceache force-pushed the feature/rate_control branch from 0ee9788 to 33573da Compare February 12, 2023 22:03
kazoo/client.py Outdated Show resolved Hide resolved
@Buffer0x7cd
Copy link

Hey Folks, any update on the progress of this PR ? Let me know if there is any help needed to complete this feature.

Add a "semaphore_impl" attribute on the various handlers.
Allow a new, optional, `concurrent_request_limit` argument to the client
constructor.
Change the client to bound the number of outstanding async requests with
a semaphore limited to `concurrent_request_limit`.

Fixes python-zk#664
@ceache ceache force-pushed the feature/rate_control branch from 33573da to df7c611 Compare February 6, 2024 06:28
@ceache ceache marked this pull request as ready for review February 6, 2024 06:36
@ceache
Copy link
Contributor Author

ceache commented Feb 8, 2024

@python-zk/maintainers I think I have implemented all what we discussed. All the more "future looking" ideas discussed here (prometheus, OTL,) would be addressed in future PRs

@ceache ceache requested a review from a-ungurianu February 8, 2024 23:00
@@ -635,6 +648,16 @@ def _call(self, request, async_object):
async_object.set_exception(SessionExpiredError())
return False

if self.rate_limiting_sem:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a test be added to check that we get blocked and then get released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TreeCache recipe creating heavy load on ZK while reconnecting.
6 participants