Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrects collection loop period #323

Merged
merged 3 commits into from
Sep 9, 2020
Merged

Corrects collection loop period #323

merged 3 commits into from
Sep 9, 2020

Conversation

prognant
Copy link
Contributor

@prognant prognant commented Sep 3, 2020

It seems that we computed a sleepPeriod that was never used.

If the collection took less that the loopPeriod I agree with the calculation. However if the collection exceeds the loopPeriodon one hand I find it confusing to still wait theloopPeriod` on the other hand I'm not sure that not doing a pause is a good idea.

@olivielpeau
Copy link
Member

The logic is a bit weird I agree, but overall I don't think sleeping for loopPeriod really makes that much sense after a long collection. Let's update the logic so we don't sleep in that case, WDYT?

@prognant
Copy link
Contributor Author

prognant commented Sep 4, 2020

I agree, it sounds logic not to sleep when the collection exceed the loopPeriod. I updated the PR to reflect that.

@truthbk
Copy link
Member

truthbk commented Sep 8, 2020

First of all, thank you @prognant for finding that unused piece of logic; my bad. Regarding the sleep or not question after a long collection, there are two side to this question:

  • if we do sleep we'll get empty collection intervals - yuck!
  • if we don't sleep we're going to be constantly polling and working (which isn't necessarily a bad thing, but it means consistent stress for the target JMX application, and CPU resources constantly chewed up by the JMXFetch process).

So I guess we should settle on the lesser of the two evils.

@olivielpeau
Copy link
Member

I agree that stress on the JMX application can be an issue, but when that's the case, the fact that the collection interval is configurable (and that the config option is reflected on the Agent side, see https://github.com/DataDog/datadog-agent/blob/0fba11a70ed2e08b9bb850f59f4af6dd40629b45/pkg/config/config_template.yaml#L1174-L1177) can be an immediate mitigation.
The logic in this PR is definitely more predictable and makes more sense, so let's go ahead and merge it 👍

@prognant
Copy link
Contributor Author

prognant commented Sep 9, 2020

I think having a predictable thing is always better, on the other hand having the collection cycle lasting longer than the check period should be considered as an abnormal behaviour. Thus I think using logging severity info or warning instead of debug when the collection cycle is too long would be better, WDYT ?

@olivielpeau
Copy link
Member

I wouldn't consider it that abnormal: when lots of beans are collected, collections can be slow and take more than 15sec, and when that happens increasing the logging level may flood the logs for a lot of existing users.

That said, it would be nice to report the collection time of each instance to the status page of the Agent though + send it as telemetry if telemetry is enabled. We have a card in backlog about this (named "Report internal metrics"), outside the immediate scope of this PR.

@prognant
Copy link
Contributor Author

prognant commented Sep 9, 2020

It makes sense, I will go ahead and merge this. I like the idea of having the collection time appearing in the agent status.

@prognant prognant merged commit d27e516 into master Sep 9, 2020
@prognant prognant deleted the prognant/loop-period branch September 9, 2020 14:11
vickenty added a commit that referenced this pull request May 18, 2021
Too short iteration interval is problematic for rate calculation on slow moving
counters: small time delta pushes calculated rate way above the average rate
for the couner.

This PR revises logic introduced in #323 and limits minimum interval between
iterations to a half of the configured check period to minimize negative effect
on the rate calculations.
vickenty added a commit that referenced this pull request May 18, 2021
Too short iteration interval is problematic for rate calculation on slow moving
counters: small time delta pushes calculated rate way above the average rate
for the couner.

This PR revises logic introduced in #323 and limits minimum interval between
iterations to a half of the configured check period to minimize negative effect
on the rate calculations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants