-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filebeat: input v2 compat uses random ID for CheckConfig #41585
filebeat: input v2 compat uses random ID for CheckConfig #41585
Conversation
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated.
12cc1ab
to
32f04e3
Compare
CHANGELOG.next.asciidoc
Outdated
@@ -48,6 +48,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff] | |||
- Change log.file.path field in awscloudwatch input to nested object. {pull}41099[41099] | |||
- Remove deprecated awscloudwatch field from Filebeat. {pull}41089[41089] | |||
- The performance of ingesting SQS data with the S3 input has improved by up to 60x for queues with many small events. `max_number_of_messages` config for SQS mode is now ignored, as the new design no longer needs a manual cap on messages. Instead, use `number_of_workers` to scale ingestion rate in both S3 and SQS modes. The increased efficiency may increase network bandwidth consumption, which can be throttled by lowering `number_of_workers`. It may also increase number of events stored in memory, which can be throttled by lowering the configured size of the internal queue. {pull}40699[40699] | |||
- Fixes filestream logging the error "filestream input with ID 'ID' already exists, this will lead to data duplication[...]" on Kubernetes when using autodiscover, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue/PR links are missing, also the line ends in a comma, did you forget to add/commit the rest of the line?
filebeat/input/v2/compat/compat.go
Outdated
} | ||
|
||
// using math/rand for performance, generate a 0-9 string | ||
err = testCfg.SetString("inputID", -1, inputID+strconv.Itoa(rand.Intn(10))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
Do you think adding a single digit at the end of the string is enough? I believe it's better to use something longer to avoid the chance of collision.
I agree with the fix that in the sense that However the scenario you described to test the PR makes me wonder if the Kubernetes autodiscover is working as expected. Starting a new pod should not re-trigger the input start for an existing pod. That reminds me of an old issue where pod start/stop events were emitted when processing other Kubernetes events: #34717. I wonder if there was a regression, or there is another cause for this behaviour now. |
…ts' into 31767-filestream-id-already-exists
that is a good point. But here, on the scenario used to reproduce this issue it isn't a new pod start triggering a start for an existing pod. It's the same pod, a stop then a start events are triggered while it's on In order to investigate a possible issue with autodiscover, fist it'd be necessary to find out if the start/stop events for the pod in Anyway it's another issue. |
95c0b33
to
67bcd91
Compare
…ts' into 31767-filestream-id-already-exists
The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated. (cherry picked from commit 697ede4)
The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated. (cherry picked from commit 697ede4)
The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated.
…1641) The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated. (cherry picked from commit 697ede4) Co-authored-by: Anderson Queiroz <[email protected]>
…CheckConfig (#41642) * filebeat: input v2 compat uses random ID for CheckConfig (#41585) The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes. Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning. Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated. (cherry picked from commit 697ede4) --------- Co-authored-by: Anderson Queiroz <[email protected]>
Proposed commit message
filebeat: input v2 compat uses random ID for CheckConfig
The
CheckConfig
function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes.Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning.
Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
How to test this PR locally
Run a
kind
cluster:kind create cluster
kubectl config use-context kind-kind
DEV=true SNAPSHOT=true TEST_PLATFORMS="linux/amd64" TEST_PACKAGES="docker" mage package
kind load docker-image docker.elastic.co/beats/filebeat:9.0.0-SNAPSHOT
kubectl apply -f k8s.yaml
.k8s.yaml
kubectl run busybox1 --image=busybox
,kubectl run busybox2 --image=busybox
Related issues
filestream
input logs an error when an existing input is reloaded with the same ID #31767Use cases
filebeat on kubernetes using sutodiscover