-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add execution rate metering to the command engine #9356
Comments
How about Microprofile Metrics? |
(This was opened as a followup to a meeting with @siacus and @scolapasta, and as part of the "rate limiting logic" spike). |
There's some interesting information in that ActionLogRecord table. |
Some quick numbers extracted from the ActionLogRecords for the month of January 2023.
These numbers appear to be somewhat predictable; aside from the "get private url" one maybe (it may be executed somewhat unnecesserily by every load of the dataset page), although it does not appear to be super time-consuming. When counting the execution time, almost all of it was spent executing these top 7:
Again, I'd say these numbers are fairly predictable/expected. Specifically, the part about the Update Dataset Command being the top hog. The 2 Delete commands that follow are somewhat expected too; an interesting data point for sure. |
An obvious quick check was to separately count the commands executed by the
These however constitute only a small fraction of the total execution time. Again, a perfectly expected result, seeing how it's the update and delete commands that are the highest consumers. The total number of seconds spent executing anonymous commands was around 52K, i.e. under 15 hours total, full list below:
Note that the bulk of the above is spent running the CreateGustebookResponse command; and that almost all of our downloads are in fact anonymous (but we already knew that). (you're may be wondering what's up with that lone DestroyDasetCommand executed by the guest user - there's only one, yes - at the cost of 7 milliseconds. most likely, an unsuccessful attempt by an admin - possibly me - to destroy a dataset via the api, without properly supplying the api key on the first attempt) |
Once again, let's be careful about interpreting these numbers. They do NOT by any means indicate that it doesn't cost us much to serve read-only content to anonymous users. Again, these are only the commands. There's plenty of evidence that the search API/solr becomes the performance bottleneck during aggressive robot crawls, and none of that heavy lifting goes through the command engine and gets recorded there - and that's only one example. |
Unclear what the objective is. Work on design. 80. Some actual coding can certainly happen. We can add an outline. |
2023/09/13: Note, this issue is paired with another: #9409. |
If I understand this correctly there are two asks here. Second ask is for rate limiting. This feature does not need the metrics/aggregated data from the first task. This would only need to cache the command api calls, possibly using bucket4j or setting up a global redis cache. It would be nice to know if other cacheing or rate limiting will be needed in the future to help decide which caching system to use. |
2024/01/03: Resized during kickoff |
…server error; this is a quick fix, let me see if I can think of something better. #9356
…y with all the other commands. #9356
The proposed idea is to modify the command engine as to make it possible to keep accurate counts of executed commands per specific time intervals. We would want to count the totals for all commands, as well as specific commands (and, possibly, classes of commands?) as well as the counts and rates for individual users.
Note that this can be achieved without adding any new data structures (the values can be obtained by counting the entries in the ActionLogRecord table). However, that table is quite unwieldy on any busy installation, so we almost certainly want to add some efficient way of caching and updating the counts in the database.
Then we want to add a system of configurable rate limits for specific commands, in a way that would allow setting different limits for different users/groups of users, etc.
In addition to limits by the number of executed commands - for example, "certain number of UpdateDatasetVersionCommand per minute allowed for an otherwise unprivileged logged-in user" - Action Log records how much time it took to execute each command as well. So we may want to consider limiting use by that measure too (?). For example, if a specific user keeps making edits, but for whatever reason the UpdateDatasetVersionCommand on their dataset is especially cpu- and time-consuming - maybe that could trigger some red flags too.
A simplest use of this functionality would be to set a rate limit on, say, Get*Version commands for the anonymous (guest) users, and that would address a somewhat common case of a scripted crawler plowing through our holdings; past a certain call in too short a period of time the API starts giving them "too busy, try again later".
More care will need to be taken to make our pages communicate this "try again later" message to the UI users. We don't want anyone to lose a page-worth of edited metadata they are trying to save, etc.
The text was updated successfully, but these errors were encountered: