-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive calling of the DD API by KEDA #5521
Comments
Hello @Adityashar Thanks for reporting the issue. We are currently working on another solution via integrating the DD Agent as a source for querying the metrics: #5355 This will improve the behavior and relies on the agent the rate limiting management. I'm not totally sure about adding a retry/delay system because it'll be released probably at the same time as the support for DD Agent and it'll handle the situation quite better, but I'm not against of it. |
Thanks for this information @JorTurFer, looking forward to this feature! |
I agree with @JorTurFer |
@JorTurFer @zroubalik I was taking a look at @arapulido's draft code and saw this line: keda/pkg/scalers/datadog_scaler.go Line 151 in 76a58a9
Does this mean that we would need Datadog's APIService to use this feature? Also IIRC, there can only be one APIService in a cluster for the external.metrics.k8s.io , i.e., either Keda or Datadog.
|
I don't think so, that path is the path exposed by the server, so the idea is that you will have to install the DD Agent without registering the APIService. Then you can set the DD service endpoint in KEDA and KEDA will query the DD Agent without registering the APIService |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
Report
Hi Team,
We are observing very high requests from keda's DD Scaler to the DD API. These can go as high as
1000 / queries
a minute while there are only64
ScaledObject deployed in our platform currently.Since there are other applications as well in our platform that are using the DD API
/api/v1/query
, we often get the below error message from keda for all of our scaledobjects and this disrupts the functionality of the formers as well.your Datadog account reached the 1600 queries per 60 seconds rate limit, next limit reset will happen in X seconds
I have gone through the doc of datadog scaler and its rate-limiting (https://keda.sh/docs/2.11/scalers/datadog/#polling-intervals-and-datadog-rate-limiting), however I feel we could improve some of the keda code as well to reduce this calling.
There are two things that I observed in keda's codebase:
For both of these above points, we could avoid invalidating the cache for a
scaledobject
or hitting DD API multiple times in case we receive -429
response code from DD (our API is getting rate-limited)no Datadog metrics returned for the given time window
error message (as computed in the datadog_scaler.go code). There are many metrics such as kafka lag, which remain null(?) for most durations unless there's actual lag.I see a few options that we could implement going forward:
sleep
orretryAfter
config which could be invoked in case we are getting 429s from DD, AWS etcI would really appreciate everyone's suggestions on these.
Expected Behavior
The number of calls made by keda to DD are low even during errors such as
429s
andno Datadog metrics returned for the given time window
.useCachedMetrics
is a helpful feature to cater to the incoming metric requests from the HPA. However, once an error is received (especially one of the above two), the cache gets deleted (which could have been avoided). This may lead to at most 10 calls to DD in a minute for single scaler -2 * 4 (https://github.com/kedacore/keda/blob/v2.11.2/pkg/scaling/scale_handler.go#L409) due to HPA and 2 * 1 (https://github.com/kedacore/keda/blob/v2.11.2/pkg/scaling/scale_handler.go#L520) due to reconcile loop).
Actual Behavior
Excessive calling to DD by keda
Steps to Reproduce the Problem
Logs from KEDA operator
No response
KEDA Version
2.11.2
Kubernetes Version
1.25
Platform
Amazon Web Services
Scaler Details
Datadog
Anything else?
No response
The text was updated successfully, but these errors were encountered: