Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rate metric reporting. Throw away any value that is less than a previously submitted value. #146

Closed
wants to merge 1 commit into from

Conversation

arrawatia
Copy link
Contributor

@arrawatia arrawatia commented Jul 13, 2017

This PR fixes reporting behavior for rate metric type.

Based on description of rate type Metric Type, the reporter should Throw away any value that is less than a previously submitted value.

This is helpful to get sensible metrics when the JVM is rebooted. We see a lot of huge negative values when the JVM is restarted (because the JMX metrics are reset to 0) which trigger a flurry of alerts and charts also become confusing for users.

image

@travisjeffery
Copy link

@hush-hush @truthbk

@irabinovitch irabinovitch requested a review from jeremy-lq July 18, 2017 03:07
@truthbk truthbk requested review from hush-hush and truthbk and removed request for hush-hush July 18, 2017 11:18
@truthbk
Copy link
Member

truthbk commented Jul 24, 2017

Any value < prev_value value would generally mean we should probably be using a gauge instead of a rate. That's a fact. I think that is at the core of that rationale and would make this PR absolutely correct. I guess what we now have to decide is if we should "hard" impose this behavior on the metric or not. My main concern here is "breaking" some metrics where we may have used a rate on a gauge-type metric (or perhaps some esoteric bean where a negative rate could have some sensible meaning...).

This looks good, but let me take a deeper look :)

@truthbk
Copy link
Member

truthbk commented Aug 23, 2017

We're going to have to put this behind a feature flag because there's a risk of existing metrics having negative rates. There's naturally a fear of breaking existing alert or dashboards with the change in behavior. By hiding this behind a feature flag we can guarantee any customer can opt into the saner behavior here provided. We will make those changes and merge into this PR.

Thank you for your patience.

@truthbk
Copy link
Member

truthbk commented Aug 24, 2017

This PR was superseded by #154 where we add a feature flag to enable the desired behavior. Your commit is included in that PR @arrawatia - closing this one. Thank you very much for writing it and bringing our attention to it.

@truthbk truthbk closed this Aug 24, 2017
@jeffwidman
Copy link
Contributor

jeffwidman commented Nov 3, 2017

@truthbk what is the policy for deprecating things?

There has got to be some point somewhere where the "correct" behavior becomes the default rather than hiding it behind a flag.

The problem with hiding behind a flag is that new users who don't read the docs thoroughly will continue to set things up incorrectly, further cementing the incorrect behavior and making it more painful to pull off the bandaid later.

For example, the fix of #154 is missing updates for other checks that rely on jmxfetch, for example the Kafka check. So as it stands, no one currently will realize this needs to be added to the Kafka check...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants