fix(outputs/stackdriver): cumulative interval start times #10097

n-oden · 2021-11-11T18:43:31Z

Required for all PRs:

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

resolves #7758

Cumulative metrics, when posted to google cloud monitoring (nee
Stackdriver), require a TimeInterval with a start time that corresponds
to the last time at which the counter/cumulative metric was reset to
zero, cannot overlap previous intervals, and cannot be more than 25
hours in the past:

https://cloud.google.com/monitoring/api/ref_v3/rest/v3/TimeInterval

The stackdriver output plugin had been creating StartTimes of
Thu Jan 1 00:00:01 UTC 1970 leading to bizarre behavior: stackdriver graphs
would reflect flatlined fractional values no matter what data telegraf
was sending, and direct queries to the projects.timeSeries.list API
would initially return the last handful of posted values but then
eventually return only {"unit": "not_a_unit"} for the same query,
presumably due to stackdriver having GC'ed the invalid intervals on the
backend.

So, herein:

create a cache of start times and observed last values for all counter metrics (keyed by name, tags and field)
reset the start time of a cache entry if the newest value is less than the previous observed value (counter reset)
reset the start time of a cache entry if it is greater than 24 hours old
while we're at it, use the telegraf logger rather than the base golang logger

n-oden · 2021-11-11T19:39:38Z

Note that as mentioned in the readme update, even with this change there is the potential for undefined behavior if one of the underlying counters resets without telegraf restarting in concert with it. Doing this "right" would probably involve caching a separate start time and the last observed value for each counter metric and resetting the start time any time the metric resets, which seems to somewhat cut against the grain of telegraf's normal near-statelessness on the output side, but if there's interest I could attempt to implement that.

n-oden · 2021-11-12T17:16:25Z

Actually, having thought on it a bit, the per-metric cache really seems to be the only way to do this: setting the start time to the process start time means that if the input source starts exposing a new counter >25 hours after we start, that counter will never make it to stackdriver.

So, an update, complete with unit tests.

n-oden · 2021-11-28T22:40:06Z

@sspaink @Hipska @powersj Could I get a look taken at this? (Apologies in advance if I've missed a step here.)

srebhan · 2021-12-21T08:20:56Z

I guess this would benefit from plugin state-persistence (PR #9476)?

srebhan

Thanks for approaching this @n-oden! While the overall approach looks good, I have some comments in the code. Please try to reduce the non-related changes (renaming of aliases, reordering of imports, etc) to an absolute minimum and submit them as separate PRs to ease review. Furthermore, do you think it's possible to leave getStackdriverTimeInterval() as is and instead determine the start and end times in a separate function?

plugins/outputs/stackdriver/counter_cache.go

plugins/outputs/stackdriver/stackdriver.go

n-oden · 2021-12-21T16:18:13Z

hey @srebhan glad to finally have a review for this! Your comments seem reasonable; I'll try to get them all addressed today or tomorrow.

resolves: #7758

srebhan · 2021-12-21T17:20:14Z

@n-oden sorry for taking so long to come to this... :-(

n-oden · 2021-12-21T18:22:28Z

@srebhan should be good to go, I think!

srebhan

@n-oden thanks for the update. I have some more comments, but we are almost there I think.

plugins/outputs/stackdriver/stackdriver.go

n-oden · 2021-12-22T17:05:10Z

@srebhan all done I think?

srebhan

The linter found some more... :-)

plugins/outputs/stackdriver/stackdriver_test.go

telegraf-tiger · 2021-12-22T18:15:29Z

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Looks like new artifacts were built from this PR.

Expand this list to get them here ! 🐯

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	s390x.rpm	linux_amd64.tar.gz
mipsel.deb	x86_64.rpm	linux_arm64.tar.gz
ppc64el.deb		linux_armel.tar.gz
s390x.deb		linux_armhf.tar.gz
		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

srebhan

Looks good to me. Thanks for fixing this long-standing issue @n-oden!

…ta#10097)

- Cherry-pick pgpool/pgpool2_exporter#14 into our build of pgpool_exporter - Build telegraf from source; circle-ci is now 404ing the artifact link from influxdata/telegraf#10097 :( - Bump version to 1.0.5

(cherry picked from commit 697855c)

telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 11, 2021

n-oden mentioned this pull request Nov 11, 2021

stackdriver output plugin sends CUMULATIVE metrics incorrect #7758

Closed

n-oden force-pushed the fix-stackdriver-cumulative-intervals branch 2 times, most recently from df3a332 to eca8a69 Compare November 12, 2021 17:58

n-oden force-pushed the fix-stackdriver-cumulative-intervals branch 2 times, most recently from 0014ce5 to 742a671 Compare November 28, 2021 22:35

n-oden force-pushed the fix-stackdriver-cumulative-intervals branch from 742a671 to 38129d2 Compare November 30, 2021 23:34

srebhan reviewed Dec 21, 2021

View reviewed changes

srebhan self-assigned this Dec 21, 2021

srebhan added the plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins label Dec 21, 2021

fix(outputs/stackdriver): cumulative interval start times

0653405

resolves: #7758

address comments

4f0b09b

n-oden force-pushed the fix-stackdriver-cumulative-intervals branch from 38129d2 to 4f0b09b Compare December 21, 2021 17:33

n-oden requested a review from srebhan December 21, 2021 18:06

defer unlock in cc.set() method

8802a58

srebhan reviewed Dec 22, 2021

View reviewed changes

plugins/outputs/stackdriver/stackdriver.go Outdated Show resolved Hide resolved

plugins/outputs/stackdriver/stackdriver.go Outdated Show resolved Hide resolved

plugins/outputs/stackdriver/stackdriver.go Outdated Show resolved Hide resolved

address more comments

9dfc14b

n-oden requested a review from srebhan December 22, 2021 15:21

style edits

6abb64d

srebhan reviewed Dec 22, 2021

View reviewed changes

fix linter issues; use noerror/notnil

716481d

n-oden requested a review from srebhan December 22, 2021 18:32

srebhan approved these changes Dec 22, 2021

View reviewed changes

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Dec 22, 2021

powersj approved these changes Dec 22, 2021

View reviewed changes

powersj merged commit 697855c into influxdata:master Dec 22, 2021

n-oden deleted the fix-stackdriver-cumulative-intervals branch December 22, 2021 22:24

powersj pushed a commit to powersj/telegraf that referenced this pull request Jan 21, 2022

fix: cumulative interval start times for stackdriver output (influxda…

e4ab175

…ta#10097)

n-oden mentioned this pull request Jan 24, 2022

Cherry-pick pgpool_exporter bugfix odenio/pgpool-cloudsql#3

Merged

reimda pushed a commit that referenced this pull request Jan 27, 2022

fix: cumulative interval start times for stackdriver output (#10097)

44983e3

(cherry picked from commit 697855c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(outputs/stackdriver): cumulative interval start times #10097

fix(outputs/stackdriver): cumulative interval start times #10097

n-oden commented Nov 11, 2021 •

edited

Loading

n-oden commented Nov 11, 2021

n-oden commented Nov 12, 2021

n-oden commented Nov 28, 2021

srebhan commented Dec 21, 2021

srebhan left a comment

n-oden commented Dec 21, 2021

srebhan commented Dec 21, 2021

n-oden commented Dec 21, 2021

srebhan left a comment

n-oden commented Dec 22, 2021

srebhan left a comment

telegraf-tiger bot commented Dec 22, 2021

Artifact URLs

srebhan left a comment

fix(outputs/stackdriver): cumulative interval start times #10097

fix(outputs/stackdriver): cumulative interval start times #10097

Conversation

n-oden commented Nov 11, 2021 • edited Loading

Required for all PRs:

n-oden commented Nov 11, 2021

n-oden commented Nov 12, 2021

n-oden commented Nov 28, 2021

srebhan commented Dec 21, 2021

srebhan left a comment

Choose a reason for hiding this comment

n-oden commented Dec 21, 2021

srebhan commented Dec 21, 2021

n-oden commented Dec 21, 2021

srebhan left a comment

Choose a reason for hiding this comment

n-oden commented Dec 22, 2021

srebhan left a comment

Choose a reason for hiding this comment

telegraf-tiger bot commented Dec 22, 2021

Artifact URLs

srebhan left a comment

Choose a reason for hiding this comment

n-oden commented Nov 11, 2021 •

edited

Loading