[filebeat][gcs] - Added retry config & refactored the default config options #41699

ShourieG · 2024-11-20T10:02:00Z

Type of change

Enhancement

Proposed commit message

Added retry config.
Refactored the default config options
Updated docs accordingly and fixed some existing errors.
Added tests.
Removed an instance of an un-used log parameter in the fetchStorageClient method

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~- [ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

[ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify · 2024-11-20T10:02:40Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ShourieG? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-11-20T10:02:41Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

elasticmachine · 2024-11-20T10:03:44Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

efd6

There are two distinct changes being made here; the refactor and the addition of a new feature. I think they should be in separate PRs.

efd6 · 2024-11-25T22:43:54Z

x-pack/filebeat/input/gcs/config.go

@@ -90,6 +92,17 @@ type jsonCredentialsConfig struct {
 	AccountKey string `config:"account_key"`
 }

+type retryConfig struct {
+	// MaxAttempts is the maximum number of retry attempts, defaults to 3.
+	MaxAttempts int `config:"max_attempts" validate:"min=0"`


Why min=0? Is this attempts or retries? If it's retries, I would rename this MaxRetries and then a zero makes sense, otherwise it's saying don't try the first time.

efd6 · 2024-11-25T22:45:10Z

x-pack/filebeat/input/gcs/config.go

+	// MaxBackOffDuration is the maximum value of the retry period, defaults to 30 seconds.
+	MaxBackOffDuration time.Duration `config:"max_backoff_duration" validate:"min=2"`
+	// BackOffMultiplier is the factor by which the retry period increases. It should be greater than 1 and defaults to 2.
+	BackOffMultiplier float64 `config:"backoff_multiplier" validate:"min=2"`


Does go-ucfg allow float constraints? Is a value greater than one but less than two a valid configuration in principle? If it is, should we not allow that?

CEL also uses Limit *float64 config:"limit"` in its resource config as well as a couple of other inputs. There is a float unpacker present. FloatUnpacker.

As for if we should allow or not, I mean if the sdk allows it then why not extend the same feature set to the enduser ?

That's not what I'm asking. I'm asking if min can be a float constraint.

efd6 · 2024-11-25T22:47:46Z

x-pack/filebeat/input/gcs/config.go

+	MaxWorkers int `config:"max_workers,omitempty" validate:"max=5000"`
 	// Poll - Defines if polling should be performed on the input bucket source.
-	Poll *bool `config:"poll,omitempty"`
+	Poll bool `config:"poll,omitempty"`
 	// PollInterval - Defines the maximum amount of time to wait before polling for the next batch of objects from the bucket.
-	PollInterval *time.Duration `config:"poll_interval,omitempty"`
+	PollInterval time.Duration `config:"poll_interval,omitempty"`
 	// ParseJSON - Informs the publisher whether to parse & objectify json data or not. By default this is set to
 	// false, since it can get expensive dealing with highly nested json data.
-	ParseJSON *bool `config:"parse_json,omitempty"`
+	ParseJSON bool `config:"parse_json,omitempty"`
 	// BucketTimeOut - Defines the maximum time that the sdk will wait for a bucket api response before timing out.
-	BucketTimeOut *time.Duration `config:"bucket_timeout,omitempty"`
+	BucketTimeOut time.Duration `config:"bucket_timeout,omitempty"`


Why were these changed?

explanation below

efd6 · 2024-11-25T22:48:20Z

x-pack/filebeat/docs/inputs/input-gcs.asciidoc

+
+This attribute can be used to configure a list of sub attributes that directly control how the input should behave when a download for a file/object fails or gets interrupted. The list of sub attributes are as follows :-
+
+    1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`. 


Suggested change

1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`.

1. `max_retries`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`.

efd6 · 2024-11-25T22:49:07Z

x-pack/filebeat/docs/inputs/input-gcs.asciidoc

-The input can be configured to work with and without polling, though currently, if polling is disabled it will only 
-perform a one time passthrough, list the file contents and end the process. Polling is generally recommented for most cases
-even though it can get expensive with dealing with a very large number of files.
+The input can be configured to work with and without polling, though if polling is disabled, it will only perform a one time passthrough, list the file contents and end the process.


Suggested change

The input can be configured to work with and without polling, though if polling is disabled, it will only perform a one time passthrough, list the file contents and end the process.

The input can be configured to work with and without polling, though if polling is disabled it will only perform a single collection of data, list the file contents and end the process.

The change s/passthrough/…/ significantly changes the semantics, but I don't think the original meaning is correct; passthrough is a noun that doesn't describe an action.

Though I'm unclear why this is being changed since it does not appear to be related to the logic change here.

efd6 · 2024-11-25T22:58:50Z

x-pack/filebeat/docs/inputs/input-gcs.asciidoc

+different values, thus offering extensive flexibility and customization. Examples <<bucket-overrides,below>> show this behavior.

 On receiving this config the google cloud storage input will connect to the service and retrieve a `Storage Client` using the given `bucket_name` and 
 `auth.credentials_file`, then it will spawn two main go-routines, one for each bucket. After this each of these routines (threads) will initialize a scheduler 
-which will in turn use the `max_workers` value to initialize an in-memory worker pool (thread pool) with `3` `workers` available. Basically that equates to two instances of a worker pool,
-one per bucket, each having 3 workers. These `workers` will be responsible for performing `jobs` that process a file (in this case read and output the contents of a file).
+which will in turn use the `max_workers` value to initialize an in-memory worker pool (thread pool) with `3` `workers` available. Basically that equates to two instances of a worker pool, one per bucket, each having 3 workers. These `workers` will be responsible for performing `jobs` that process a file (in this case read and output the contents of a file).


Do we need to change these in this PR?

have replied with an explanation and option below

efd6 · 2024-11-25T22:59:24Z

x-pack/filebeat/docs/inputs/input-gcs.asciidoc

@@ -213,7 +211,7 @@ This is a specific subfield of a bucket. It specifies the bucket name.

 This attribute defines the maximum amount of time after which a bucket operation will give and stop if no response is recieved (example: reading a file / listing a file). 
 It can be defined in the following formats : `{{x}}s`, `{{x}}m`, `{{x}}h`, here `s = seconds`, `m = minutes` and `h = hours`. The value `{{x}}` can be anything we wish.
-If no value is specified for this, by default its initialized to `50 seconds`. This attribute can be specified both at the root level of the configuration as well at the bucket level. The bucket level values will always take priority and override the root level values if both are specified. The value of `bucket_timeout` that should be used depends on the size of the files and the network speed. If the timeout is too low, the input will not be able to read the file completely and `context_deadline_exceeded` errors will be seen in the logs. If the timeout is too high, the input will wait for a long time for the file to be read, which can cause the input to be slow. The ratio between the `bucket_timeout` and `poll_interval` should be considered while setting both the values. A low `poll_interval` and a very high `bucket_timeout` can cause resource utilization issues as schedule ops will be spawned every poll iteration. If previous poll ops are still running, this could result in concurrently running ops and so could cause a bottleneck over time.
+If no value is specified for this, by default its initialized to `120 seconds`. This attribute can be specified both at the root level of the configuration as well at the bucket level. The bucket level values will always take priority and override the root level values if both are specified. The value of `bucket_timeout` that should be used depends on the size of the files and the network speed. If the timeout is too low, the input will not be able to read the file completely and `context_deadline_exceeded` errors will be seen in the logs. If the timeout is too high, the input will wait for a long time for the file to be read, which can cause the input to be slow. The ratio between the `bucket_timeout` and `poll_interval` should be considered while setting both the values. A low `poll_interval` and a very high `bucket_timeout` can cause resource utilization issues as schedule ops will be spawned every poll iteration. If previous poll ops are still running, this could result in concurrently running ops and so could cause a bottleneck over time.


What is being changed here?

initialized to 50 seconds -> initialized to 120 seconds

mergify · 2024-11-25T23:02:38Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b gcs/retry_config upstream/gcs/retry_config
git merge upstream/main
git push upstream gcs/retry_config

ShourieG · 2024-11-29T07:13:17Z

There are two distinct changes being made here; the refactor and the addition of a new feature. I think they should be in separate PRs.

@efd6, So in order to have the retries working effectively, I had to introduce the defaultConfig(), which warranted some config changes/cleanup. I don't think go-ucfg allows default values in the config tags, I would need to write a custom Unpack method to achieve that and it would become unnecessarily complex. Keeping the boolean values was more of a detriment in this scenario and were unnecessary, hence decided to clean those up along with the default values which lead to the documentation changes. I agree I should have separated the PR when I realised the config was gonna require a small rework. I can still separate it if you want, I'll turn this into the refactor PR and have a separate PR for the actual change.

ShourieG · 2024-11-29T10:47:46Z

@efd6, I'm closing this PR. I've created a separate PR only for the cleanup/update to docs here. Once that is merged, I'll have the retry changes up.

efd6 · 2024-12-01T22:44:29Z

I don't think go-ucfg allows default values in the config tags, I would need to write a custom Unpack method to achieve that and it would become unnecessarily complex.

The pattern to achieve this is to prefill the default config. Zero values in the serialised config do not over-write the value in the passed-in default.

ShourieG added 2 commits November 20, 2024 15:26

added retry config, refactored the defauyltConfig() and updated docs

631669f

Merge remote-tracking branch 'upstream/main' into gcs/retry_config

c8677db

ShourieG requested a review from a team as a code owner November 20, 2024 10:02

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2024

mergify bot assigned ShourieG Nov 20, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 20, 2024

ShourieG requested a review from efd6 November 20, 2024 10:03

ShourieG added the Team:Security-Service Integrations Security Service Integrations Team label Nov 20, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2024

ShourieG added enhancement needs_team Indicates that the issue/PR needs a Team:* label labels Nov 20, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2024

ShourieG added Filebeat Filebeat needs_team Indicates that the issue/PR needs a Team:* label labels Nov 20, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2024

ShourieG added needs_team Indicates that the issue/PR needs a Team:* label input:GCS labels Nov 20, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2024

updated changelog

2884058

efd6 reviewed Nov 25, 2024

View reviewed changes

ShourieG mentioned this pull request Nov 29, 2024

[filebeat][gcs] - Refactor & cleanup with updates to some default values and docs #41834

Merged

5 tasks

ShourieG closed this Nov 29, 2024

mergify bot mentioned this pull request Dec 3, 2024

[8.x](backport #41834) [filebeat][gcs] - Refactor & cleanup with updates to some default values and docs #41860

Merged

5 tasks

ShourieG deleted the gcs/retry_config branch December 3, 2024 09:45

ShourieG mentioned this pull request Dec 3, 2024

[filebeat][gcs] - Added support for retry config #41862

Merged

5 tasks

This was referenced Dec 4, 2024

[8.x](backport #41862) [filebeat][gcs] - Added support for retry config #41885

Merged

[8.17](backport #41834) [filebeat][gcs] - Refactor & cleanup with updates to some default values and docs #41986

Merged

mergify bot mentioned this pull request Dec 11, 2024

[8.17](backport #41862) [filebeat][gcs] - Added support for retry config #41990

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filebeat][gcs] - Added retry config & refactored the default config options #41699

[filebeat][gcs] - Added retry config & refactored the default config options #41699

ShourieG commented Nov 20, 2024 •

edited

Loading

mergify bot commented Nov 20, 2024

mergify bot commented Nov 20, 2024

elasticmachine commented Nov 20, 2024

efd6 left a comment

efd6 Nov 25, 2024

efd6 Nov 25, 2024

ShourieG Nov 29, 2024 •

edited

Loading

efd6 Dec 1, 2024

efd6 Nov 25, 2024

ShourieG Nov 29, 2024

efd6 Nov 25, 2024

efd6 Nov 25, 2024

efd6 Nov 25, 2024

ShourieG Nov 29, 2024

efd6 Nov 25, 2024

ShourieG Nov 29, 2024

mergify bot commented Nov 25, 2024

ShourieG commented Nov 29, 2024 •

edited

Loading

ShourieG commented Nov 29, 2024

efd6 commented Dec 1, 2024


		This attribute can be used to configure a list of sub attributes that directly control how the input should behave when a download for a file/object fails or gets interrupted. The list of sub attributes are as follows :-

		1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`.

	1. `max_attempts`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`.
	1. `max_retries`: This attribute defines the maximum number of retries that should be attempted for a failed download. The default value for this is `3`.

	The input can be configured to work with and without polling, though if polling is disabled, it will only perform a one time passthrough, list the file contents and end the process.
	The input can be configured to work with and without polling, though if polling is disabled it will only perform a single collection of data, list the file contents and end the process.

[filebeat][gcs] - Added retry config & refactored the default config options #41699

[filebeat][gcs] - Added retry config & refactored the default config options #41699

Conversation

ShourieG commented Nov 20, 2024 • edited Loading

Type of change

Proposed commit message

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Nov 20, 2024

mergify bot commented Nov 20, 2024

elasticmachine commented Nov 20, 2024

efd6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShourieG Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Nov 25, 2024

ShourieG commented Nov 29, 2024 • edited Loading

ShourieG commented Nov 29, 2024

efd6 commented Dec 1, 2024

ShourieG commented Nov 20, 2024 •

edited

Loading

ShourieG Nov 29, 2024 •

edited

Loading

ShourieG commented Nov 29, 2024 •

edited

Loading