-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(file source, kubernetes_logs source): add rotate_wait_secs config option #18904
enhancement(file source, kubernetes_logs source): add rotate_wait_secs config option #18904
Conversation
👷 Deploy request for vector-project pending review.Visit the deploys page to approve it
|
✅ Deploy Preview for vrl-playground ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
c697731
to
39dcfb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks good to me. I left some comments on some small details.
src/sources/file.rs
Outdated
/// How long to keep an open handle to a rotated log file. | ||
#[serde_as(as = "serde_with::DurationMilliSeconds<u64>")] | ||
#[configurable(metadata(docs::type_unit = "milliseconds"))] | ||
#[serde(default = "default_rotate_wait_ms")] | ||
pub rotate_wait_ms: Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For here and in the Kubernetes Logs source, rotate_wait_secs
seems more appropriate than rotate_wait_ms
as my understanding is that a reasonable value for this option is greater than 1 second. What do you think?
39dcfb1
to
f1a3393
Compare
@dsmith3197 I've addressed your feedback, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my previous feedback. There are still some small adjustments we need to make before we can merge this. Once you address the comments, can you rerun make generate component-docs
to update the generated cue files as well?
Thank you again for the contribution! I'm sure many people will find this a useful enhancement.
88da3d1
to
cf90612
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! We can approve and merge this once you rerun make generate-component-docs
and commit the changes.
cf90612
to
f46466c
Compare
Done. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syedriko apologies, I see this one never merged either because it needs a changelog entry. Do you mind adding one? See https://github.com/vectordotdev/vector/blob/master/changelog.d/README.md
Head branch was pushed to by a user without write access
f46466c
to
74fbd55
Compare
Done. |
Thanks! |
Head branch was pushed to by a user without write access
30cf446
to
cfc344c
Compare
* enhancement(kubernetes_logs source) For the file and kubernetes_logs sources, introduced a new configuration variable, rotate_wait_secs, defaulting to practical infinity. Out of the box, this default effectively turns this feature off. rotate_wait_secs determines for how long Vector keeps trying to read from a log file that has been deleted (most likely due to log rotation, hence the name of the variable). Once that time span has expired, Vector stops reading from and closes the file descriptor of the deleted file, thus allowing the OS to reclaim the storage space occupied by the file. This behavior is similar to that of Fluentd's tail plugin: https://docs.fluentd.org/input/tail#rotate_wait Addresses issue (vectordotdev#18863) Co-authored-by: Jesse Szwedko <[email protected]>
146d918
to
38cc798
Compare
Regression Detector ResultsRun ID: be200d69-8e23-4fed-9ae1-d67811999976 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | syslog_splunk_hec_logs | ingress throughput | +2.18 | [+2.10, +2.25] |
➖ | syslog_humio_logs | ingress throughput | +1.97 | [+1.83, +2.11] |
➖ | socket_to_socket_blackhole | ingress throughput | +0.93 | [+0.86, +1.00] |
➖ | syslog_loki | ingress throughput | +0.92 | [+0.86, +0.99] |
➖ | splunk_hec_route_s3 | ingress throughput | +0.56 | [+0.07, +1.05] |
➖ | http_text_to_http_json | ingress throughput | +0.46 | [+0.32, +0.60] |
➖ | http_elasticsearch | ingress throughput | +0.40 | [+0.33, +0.48] |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +0.40 | [+0.26, +0.54] |
➖ | otlp_grpc_to_blackhole | ingress throughput | +0.33 | [+0.25, +0.42] |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +0.23 | [+0.14, +0.32] |
➖ | http_to_s3 | ingress throughput | +0.17 | [-0.11, +0.45] |
➖ | http_to_http_noack | ingress throughput | +0.13 | [+0.03, +0.22] |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.10 | [+0.02, +0.18] |
➖ | fluent_elasticsearch | ingress throughput | +0.08 | [-0.39, +0.56] |
➖ | http_to_http_json | ingress throughput | +0.05 | [-0.03, +0.12] |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.14, +0.15] |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.16, +0.16] |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.02 | [-0.13, +0.09] |
➖ | enterprise_http_to_http | ingress throughput | -0.10 | [-0.17, -0.03] |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -0.17 | [-0.28, -0.07] |
➖ | syslog_log2metric_humio_metrics | ingress throughput | -0.29 | [-0.39, -0.19] |
➖ | http_to_http_acks | ingress throughput | -0.45 | [-1.76, +0.86] |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.52 | [-0.63, -0.40] |
➖ | datadog_agent_remap_blackhole | ingress throughput | -0.59 | [-0.67, -0.50] |
➖ | file_to_blackhole | egress throughput | -0.60 | [-2.93, +1.74] |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.83 | [-0.93, -0.74] |
➖ | otlp_http_to_blackhole | ingress throughput | -3.10 | [-3.25, -2.96] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector ResultsRun ID: 4fb152f8-b0e6-40cf-821d-40aaa7e83ebd Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | syslog_log2metric_humio_metrics | ingress throughput | +2.80 | [+2.70, +2.90] |
➖ | splunk_hec_route_s3 | ingress throughput | +1.98 | [+1.48, +2.49] |
➖ | syslog_humio_logs | ingress throughput | +0.70 | [+0.58, +0.82] |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.57 | [+0.46, +0.67] |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.25 | [+0.16, +0.34] |
➖ | http_to_s3 | ingress throughput | +0.24 | [-0.05, +0.52] |
➖ | http_to_http_noack | ingress throughput | +0.14 | [+0.06, +0.23] |
➖ | http_to_http_json | ingress throughput | +0.04 | [-0.04, +0.11] |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.16, +0.16] |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.00 | [-0.15, +0.14] |
➖ | datadog_agent_remap_blackhole | ingress throughput | -0.01 | [-0.11, +0.09] |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.03 | [-0.14, +0.09] |
➖ | enterprise_http_to_http | ingress throughput | -0.11 | [-0.17, -0.04] |
➖ | http_text_to_http_json | ingress throughput | -0.23 | [-0.35, -0.11] |
➖ | syslog_loki | ingress throughput | -0.25 | [-0.31, -0.20] |
➖ | otlp_grpc_to_blackhole | ingress throughput | -0.32 | [-0.41, -0.24] |
➖ | fluent_elasticsearch | ingress throughput | -0.37 | [-0.84, +0.10] |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -0.38 | [-0.55, -0.22] |
➖ | syslog_splunk_hec_logs | ingress throughput | -0.39 | [-0.47, -0.32] |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.40 | [-0.50, -0.30] |
➖ | http_elasticsearch | ingress throughput | -0.50 | [-0.57, -0.43] |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -0.65 | [-0.73, -0.58] |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | -1.02 | [-1.11, -0.94] |
➖ | socket_to_socket_blackhole | ingress throughput | -1.19 | [-1.25, -1.13] |
➖ | otlp_http_to_blackhole | ingress throughput | -2.00 | [-2.14, -1.86] |
➖ | http_to_http_acks | ingress throughput | -2.30 | [-3.60, -0.99] |
➖ | file_to_blackhole | egress throughput | -2.44 | [-4.83, -0.06] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
cc @wanjunlei |
…config option (vectordotdev#18904) * enhancement(file source) * enhancement(kubernetes_logs source) For the file and kubernetes_logs sources, introduced a new configuration variable, rotate_wait_secs, defaulting to practical infinity. Out of the box, this default effectively turns this feature off. rotate_wait_secs determines for how long Vector keeps trying to read from a log file that has been deleted (most likely due to log rotation, hence the name of the variable). Once that time span has expired, Vector stops reading from and closes the file descriptor of the deleted file, thus allowing the OS to reclaim the storage space occupied by the file. This behavior is similar to that of Fluentd's tail plugin: https://docs.fluentd.org/input/tail#rotate_wait Addresses issue (vectordotdev#18863) Co-authored-by: Jesse Szwedko <[email protected]>
For the file and kubernetes_logs sources, introduced a new configuration variable, rotate_wait_secs, of type Duration, defaulting to practical infinity. Out of the box, this default effectively turns this feature off. rotate_wait_secs determines for how long vector keeps trying to read from a log file that has been deleted (most likely due to log rotation, hence the name of the variable). Once that time span has expired, vector stops reading from and closes the file descriptor of the deleted file, thus allowing the OS to reclaim the storage space occupied by the file. This behavior is similar to that of Fluentd's tail plugin: https://docs.fluentd.org/input/tail#rotate_wait .
Addresses #18863