-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix awscloudwatch worker allocation #38953
Conversation
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
This pull request is now in conflicts. Could you fix it? 🙏
|
// main loop doesn't have to block on the workers | ||
// while distributing new data. | ||
workRequestChan: make(chan struct{}), | ||
workResponseChan: make(chan workResponse, 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 10 chosen for a reason? Or just good practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice to buffer it is because it lets the polling loop send multiple responses in one scheduler interval, but the choice of 10 is just an arbitrary small number. (Heuristically I think of it, buffer of 10 = ~90% less contention in the main loop than it would have had if it was synchronous. And I don't know exactly how much contention the synchronous case is, but surely eliminating 90% of it will avoid any bottleneck.)
Fix a bug in cloudwatch worker allocation that could cause data loss (#38918). The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests. (cherry picked from commit deece39)
Fix a bug in cloudwatch worker allocation that could cause data loss (#38918). The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests. (cherry picked from commit deece39) Co-authored-by: Fae Charlton <[email protected]>
…jects (#39353) A large cleanup in the `aws-s3` input, reorganizing the file structure and splitting internal APIs into additional helpers. This change is meant to have no functional effect, it is strictly a cleanup and reorganization in preparation for future changes. The hope is that the new layout makes initialization steps and logical dependencies clearer. The main changes are: - Make `s3Poller` and `sqsReader` into standalone input objects, `s3PollerInput` and `sqsReaderInput`, that implement the `v2.Input` interface, instead of interleaving the two implementations within the same object. * Choose the appropriate input in `(*s3InputManager).Create` based on configuration * Move associated internal API out of the shared `input.go` into the new `s3_input.go` and `sqs_input.go`, while leaving `s3.go` and `sqs.go` for auxiliary helpers. * Give each input a copy of `config` and `awsConfig`, and remove redundant struct fields that simply shadowed fields already in those configs. - In `sqsReaderInput`, use a fixed set of worker goroutines and track task allocation via channel-based work requests instead of creating ephemeral workers via the previous custom semaphore implementation (similar to the [recent cloudwatch cleanup](#38953)). * Delete `aws.Sem`, since this was its last remaining caller - Collect the helpers related to approximate message count polling into a helper object, `messageCountMonitor`, so their role in the input is clearer. - Generally, break larger steps up into smaller helper functions - Generally, collect initialization dependencies in the same place so the sequencing is clearer.
Fix a bug in cloudwatch worker allocation that could cause data loss (#38918).
The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues
awscloudwatch
input drops data #38918