fix: Fix batching logic with write records, introduce concurrent requests #8947

nirmeshk · 2021-03-06T00:08:49Z

Required for all PRs:

Associated README.md updated.
Has appropriate unit tests.
signed CLA

telegraf-tiger

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

plugins/outputs/timestream/timestream.go

ivorybilled

Looks like the test file needs to be formatted

nirmeshk · 2021-03-08T18:18:24Z

fixed the gofmt issue with timestream_test.go

ssoroka

change looks thorough, but it's a significant one. Not sure about the parallelism.

ssoroka · 2021-03-08T22:53:43Z

plugins/outputs/timestream/timestream.go

-		if err := t.writeToTimestream(writeRecordsInput, true); err != nil {
-			return err
-		}
+		go func(inp *timestreamwrite.WriteRecordsInput) {


what's the upper bound on the number of goroutines we're going to launch here? If I set my batch size to something ridiculously large, could you get hundreds or thousands of concurrent requests? Generally I'm also not really a fan of parallel writes in the output here. You're typically not going to see much in the way of improved throughput.

You are right. Currently it is unbounded.
I can add a semaphore that puts an upper bound on the concurrent go-routines.
https://github.com/golang/sync/blob/master/semaphore/semaphore.go

We are making this change as we observed that metrics were being dropped due to the requests taking longer serially. After making this change, things improved, and the metric drop stopped.

Thoughts?

Apologies for the delay in replying to this, we're catching up a little with outstanding PRs.
Yup, that sounds like a good idea to put an upper bound on the requests, please proceed with that.

Added the upper bound on the concurrency, and have introduced a parameter for the same. So customers should be able to decide on the concurrency.

ssoroka · 2021-03-08T22:55:51Z

plugins/outputs/timestream/timestream.go

+
+	// On partial failures, Telegraf will reject the entire batch of metrics and
+	// retry. writeToTimestream will return retryable exceptions only.
+	err, _ := <-errs


you need to read from this channel len(writeRecordsInputs) times, and then you can drop the waitgroup, because this will act as a natural block.

On line 355, we are only adding to errs channel if err != nil
Reading it n times currently just blocks forever when error does not happen.

I can try removing the != nil check and see if it works un-interupted

Yes, that sounds reasonable, to remove the nil check and always return the result from writeToTimestream.

I have added a range over channel. It exists on the first encountered error. But since the channel is already closed before that, it should be able to garbage collected?

plugins/outputs/timestream/timestream.go

nirmeshk · 2021-03-10T18:15:24Z

Ping on the pull request @jagularr @ssoroka
Let me know if the proposal in the comments make sense. I can make the changes and update this

nirmeshk · 2021-09-10T22:22:45Z

!signed-cla

telegraf-tiger · 2021-09-10T22:22:51Z

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

hyandell · 2021-09-10T22:31:30Z

Nirmesh - CCLA wise, I believe you're pending the Influx folk updating a database on their side.
Influx folk - let me know if you didn't receive the update.

sjwang90 · 2021-09-13T17:20:41Z

!signed-cla

powersj · 2021-09-21T21:18:34Z

@nirmeshk per #8848 it sounds like this PR improves the situation with the Timestream output. Are you going to be able to resolve some of the outstanding review questions, specifically around limiting the number of threads that get called? Thanks!

sjwang90 · 2021-10-18T20:43:30Z

Resolves #8848

@nirmeshk If you can update the PR with the three open comments above we can give it another review and merged in soon.

renovate-ombrea · 2021-11-05T14:07:16Z

any update about this PR ?

nirmeshk · 2021-11-18T18:21:48Z

Hi sorry for the delay in addressing this. Picking it up again now. Will raise the request with everything addressed.

nirmeshk · 2021-12-09T02:57:59Z

Hi @sjwang90 @powersj @popey , Have addressed all the comments, and have done another round of load testing internally to make sure there are no concurrency bugs. If you folks can take a look, I would appreciate it.

sysadmin1139 · 2022-01-01T00:00:08Z

I've spent some time today testing the artifacts, and they clearly perform better than the existing Telegraf binaries. I haven't thrown full prod loading at it, but it's now surviving our pre-prod throughput tests which version 1.21.1 manifestly doesn't. In a side-by-side test with another metrics platform we are getting highly similar results, so we don't seem to be leaking metrics.

Looking forward to this getting merged.

powersj · 2022-01-04T15:49:26Z

Hi @sjwang90 @powersj @popey , Have addressed all the comments, and have done another round of load testing internally to make sure there are no concurrency bugs. If you folks can take a look, I would appreciate it.

Fix the linter issues and I'm happy to approve.

…ests

telegraf-tiger · 2022-01-06T05:02:04Z

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Looks like new artifacts were built from this PR.

Expand this list to get them here ! 🐯

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

nirmeshk · 2022-01-06T05:17:38Z

@powersj Fixed the lint error. All checks passing now? Have re-based it over the latest changes as well.

…ests (influxdata#8947)

…ests (#8947) (cherry picked from commit ad1694b)

telegraf-tiger bot requested changes Mar 6, 2021

View reviewed changes

telegraf-tiger bot added the fix pr to fix corresponding bug label Mar 6, 2021

ivorybilled reviewed Mar 8, 2021

View reviewed changes

plugins/outputs/timestream/timestream.go Outdated Show resolved Hide resolved

ivorybilled reviewed Mar 8, 2021

View reviewed changes

nirmeshk force-pushed the mainline-2 branch from 99eee13 to 7eb9de7 Compare March 8, 2021 18:17

ssoroka suggested changes Mar 8, 2021

View reviewed changes

araddas mentioned this pull request Sep 10, 2021

Timestream move Time from CommonAttributes to Records #8848

Closed

sjwang90 added the area/aws AWS plugins including cloudwatch, ecs, kinesis label Sep 13, 2021

sjwang90 requested a review from popey September 13, 2021 17:33

sjwang90 changed the title ~~Fix batching logic with write records, introduce concurrent requests~~ fix: Fix batching logic with write records, introduce concurrent requests Sep 13, 2021

MyaLongmire added the waiting for response waiting for response from contributor label Oct 1, 2021

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 18, 2021

nirmeshk force-pushed the mainline-2 branch 2 times, most recently from 3f01a9d to 46bb44b Compare December 9, 2021 02:12

nirmeshk force-pushed the mainline-2 branch 2 times, most recently from b66c1e1 to 834a9fe Compare January 6, 2022 04:39

fix: Fix batching logic with write records, introduce concurrent requ…

b3f8c40

…ests

nirmeshk force-pushed the mainline-2 branch from 834a9fe to b3f8c40 Compare January 6, 2022 04:46

powersj approved these changes Jan 6, 2022

View reviewed changes

powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jan 6, 2022

reimda approved these changes Jan 6, 2022

View reviewed changes

reimda merged commit ad1694b into influxdata:master Jan 6, 2022

powersj pushed a commit to powersj/telegraf that referenced this pull request Jan 21, 2022

fix: Fix batching logic with write records, introduce concurrent requ…

5faa6f4

…ests (influxdata#8947)

reimda pushed a commit that referenced this pull request Jan 27, 2022

fix: Fix batching logic with write records, introduce concurrent requ…

c787b91

…ests (#8947) (cherry picked from commit ad1694b)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix batching logic with write records, introduce concurrent requests #8947

fix: Fix batching logic with write records, introduce concurrent requests #8947

nirmeshk commented Mar 6, 2021 •

edited

Loading

telegraf-tiger bot left a comment

ivorybilled left a comment

nirmeshk commented Mar 8, 2021

ssoroka left a comment

ssoroka Mar 8, 2021

nirmeshk Mar 9, 2021

popey Sep 14, 2021

nirmeshk Dec 9, 2021

ssoroka Mar 8, 2021

nirmeshk Mar 9, 2021

popey Sep 14, 2021

nirmeshk Dec 9, 2021

nirmeshk commented Mar 10, 2021

nirmeshk commented Sep 10, 2021

telegraf-tiger bot commented Sep 10, 2021

hyandell commented Sep 10, 2021

sjwang90 commented Sep 13, 2021

powersj commented Sep 21, 2021

sjwang90 commented Oct 18, 2021

renovate-ombrea commented Nov 5, 2021

nirmeshk commented Nov 18, 2021

nirmeshk commented Dec 9, 2021

sysadmin1139 commented Jan 1, 2022

powersj commented Jan 4, 2022

telegraf-tiger bot commented Jan 6, 2022

Artifact URLs

nirmeshk commented Jan 6, 2022

fix: Fix batching logic with write records, introduce concurrent requests #8947

fix: Fix batching logic with write records, introduce concurrent requests #8947

Conversation

nirmeshk commented Mar 6, 2021 • edited Loading

Required for all PRs:

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

ivorybilled left a comment

Choose a reason for hiding this comment

nirmeshk commented Mar 8, 2021

ssoroka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirmeshk commented Mar 10, 2021

nirmeshk commented Sep 10, 2021

telegraf-tiger bot commented Sep 10, 2021

hyandell commented Sep 10, 2021

sjwang90 commented Sep 13, 2021

powersj commented Sep 21, 2021

sjwang90 commented Oct 18, 2021

renovate-ombrea commented Nov 5, 2021

nirmeshk commented Nov 18, 2021

nirmeshk commented Dec 9, 2021

sysadmin1139 commented Jan 1, 2022

powersj commented Jan 4, 2022

telegraf-tiger bot commented Jan 6, 2022

Artifact URLs

nirmeshk commented Jan 6, 2022

nirmeshk commented Mar 6, 2021 •

edited

Loading