-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filebeat][gcs] - Added retry config & refactored the default config options #41699
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two distinct changes being made here; the refactor and the addition of a new feature. I think they should be in separate PRs.
@@ -90,6 +92,17 @@ type jsonCredentialsConfig struct { | |||
AccountKey string `config:"account_key"` | |||
} | |||
|
|||
type retryConfig struct { | |||
// MaxAttempts is the maximum number of retry attempts, defaults to 3. | |||
MaxAttempts int `config:"max_attempts" validate:"min=0"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why min=0
? Is this attempts or retries? If it's retries, I would rename this MaxRetries
and then a zero makes sense, otherwise it's saying don't try the first time.
// MaxBackOffDuration is the maximum value of the retry period, defaults to 30 seconds. | ||
MaxBackOffDuration time.Duration `config:"max_backoff_duration" validate:"min=2"` | ||
// BackOffMultiplier is the factor by which the retry period increases. It should be greater than 1 and defaults to 2. | ||
BackOffMultiplier float64 `config:"backoff_multiplier" validate:"min=2"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does go-ucfg allow float constraints? Is a value greater than one but less than two a valid configuration in principle? If it is, should we not allow that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CEL also uses Limit *float64
config:"limit"` in its resource config as well as a couple of other inputs. There is a float unpacker present. FloatUnpacker.
As for if we should allow or not, I mean if the sdk allows it then why not extend the same feature set to the enduser ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I'm asking. I'm asking if min can be a float constraint.
MaxWorkers int `config:"max_workers,omitempty" validate:"max=5000"` | ||
// Poll - Defines if polling should be performed on the input bucket source. | ||
Poll *bool `config:"poll,omitempty"` | ||
Poll bool `config:"poll,omitempty"` | ||
// PollInterval - Defines the maximum amount of time to wait before polling for the next batch of objects from the bucket. | ||
PollInterval *time.Duration `config:"poll_interval,omitempty"` | ||
PollInterval time.Duration `config:"poll_interval,omitempty"` | ||
// ParseJSON - Informs the publisher whether to parse & objectify json data or not. By default this is set to | ||
// false, since it can get expensive dealing with highly nested json data. | ||
ParseJSON *bool `config:"parse_json,omitempty"` | ||
ParseJSON bool `config:"parse_json,omitempty"` | ||
// BucketTimeOut - Defines the maximum time that the sdk will wait for a bucket api response before timing out. | ||
BucketTimeOut *time.Duration `config:"bucket_timeout,omitempty"` | ||
BucketTimeOut time.Duration `config:"bucket_timeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explanation below
|
||
This attribute can be used to configure a list of sub attributes that directly control how the input should behave when a download for a file/object fails or gets interrupted. The list of sub attributes are as follows :- | ||
|
||
1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`. | |
1. `max_retries`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`. |
The input can be configured to work with and without polling, though currently, if polling is disabled it will only | ||
perform a one time passthrough, list the file contents and end the process. Polling is generally recommented for most cases | ||
even though it can get expensive with dealing with a very large number of files. | ||
The input can be configured to work with and without polling, though if polling is disabled, it will only perform a one time passthrough, list the file contents and end the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input can be configured to work with and without polling, though if polling is disabled, it will only perform a one time passthrough, list the file contents and end the process. | |
The input can be configured to work with and without polling, though if polling is disabled it will only perform a single collection of data, list the file contents and end the process. |
The change s/passthrough/…/
significantly changes the semantics, but I don't think the original meaning is correct; passthrough is a noun that doesn't describe an action.
Though I'm unclear why this is being changed since it does not appear to be related to the logic change here.
different values, thus offering extensive flexibility and customization. Examples <<bucket-overrides,below>> show this behavior. | ||
|
||
On receiving this config the google cloud storage input will connect to the service and retrieve a `Storage Client` using the given `bucket_name` and | ||
`auth.credentials_file`, then it will spawn two main go-routines, one for each bucket. After this each of these routines (threads) will initialize a scheduler | ||
which will in turn use the `max_workers` value to initialize an in-memory worker pool (thread pool) with `3` `workers` available. Basically that equates to two instances of a worker pool, | ||
one per bucket, each having 3 workers. These `workers` will be responsible for performing `jobs` that process a file (in this case read and output the contents of a file). | ||
which will in turn use the `max_workers` value to initialize an in-memory worker pool (thread pool) with `3` `workers` available. Basically that equates to two instances of a worker pool, one per bucket, each having 3 workers. These `workers` will be responsible for performing `jobs` that process a file (in this case read and output the contents of a file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change these in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have replied with an explanation and option below
@@ -213,7 +211,7 @@ This is a specific subfield of a bucket. It specifies the bucket name. | |||
|
|||
This attribute defines the maximum amount of time after which a bucket operation will give and stop if no response is recieved (example: reading a file / listing a file). | |||
It can be defined in the following formats : `{{x}}s`, `{{x}}m`, `{{x}}h`, here `s = seconds`, `m = minutes` and `h = hours`. The value `{{x}}` can be anything we wish. | |||
If no value is specified for this, by default its initialized to `50 seconds`. This attribute can be specified both at the root level of the configuration as well at the bucket level. The bucket level values will always take priority and override the root level values if both are specified. The value of `bucket_timeout` that should be used depends on the size of the files and the network speed. If the timeout is too low, the input will not be able to read the file completely and `context_deadline_exceeded` errors will be seen in the logs. If the timeout is too high, the input will wait for a long time for the file to be read, which can cause the input to be slow. The ratio between the `bucket_timeout` and `poll_interval` should be considered while setting both the values. A low `poll_interval` and a very high `bucket_timeout` can cause resource utilization issues as schedule ops will be spawned every poll iteration. If previous poll ops are still running, this could result in concurrently running ops and so could cause a bottleneck over time. | |||
If no value is specified for this, by default its initialized to `120 seconds`. This attribute can be specified both at the root level of the configuration as well at the bucket level. The bucket level values will always take priority and override the root level values if both are specified. The value of `bucket_timeout` that should be used depends on the size of the files and the network speed. If the timeout is too low, the input will not be able to read the file completely and `context_deadline_exceeded` errors will be seen in the logs. If the timeout is too high, the input will wait for a long time for the file to be read, which can cause the input to be slow. The ratio between the `bucket_timeout` and `poll_interval` should be considered while setting both the values. A low `poll_interval` and a very high `bucket_timeout` can cause resource utilization issues as schedule ops will be spawned every poll iteration. If previous poll ops are still running, this could result in concurrently running ops and so could cause a bottleneck over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is being changed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialized to 50 seconds
-> initialized to 120 seconds
This pull request is now in conflicts. Could you fix it? 🙏
|
@efd6, So in order to have the retries working effectively, I had to introduce the |
The pattern to achieve this is to prefill the default config. Zero values in the serialised config do not over-write the value in the passed-in default. |
Type of change
Proposed commit message
Checklist
- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs