-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413
Comments
@reneeckstein are you facing the same issue with v3.1.8 ? (we have some fixes in place for a similar problem) |
@edsiper Yes we are facing the same issue in fluent-bit v3.1.8. I'm looking forward for v3.1.9 I noticed to metrics-related commits on master branch. |
This is happening on 3.0.3 in our K8s cluster. We have tried rolling out each version up and including the latest 3.1.9 but are still getting the same issue. If we use the following script:-
at the metrics path then we see the following:- `Handling connection for 2020
What is strange is the the fluendbit helm chart uses the path of /api/v2/metrics/prometheus for the service monitor which is even worse:- `Handling connection for 2020
|
@edsiper @cosmo0920 fluent-bit/src/flb_input_chunk.c Line 1534 in 50ff1fa
Consider the following configuration:
With a build later than 3.1.5, this results in the following metrics output:
As shown, the metric is duplicated, with an increased (correct) value. I found that for the first two metric appends, two chunks are created because the first one is busy/locked in this section: fluent-bit/src/flb_input_chunk.c Lines 1127 to 1129 in 50ff1fa
I’m not sure why this happens. If there’s more time between the metric appends, the issue does not occur. The same is true when the new interval option is enabled. The doubled metrics seem to have started appearing with this commit: out_prometheus_exporter: Handle multiply concatenated metrics type of events With the configuration above, I now get:
This output is incorrect, as it seems the second metric append from However, if I change
In this case, the metric updates normally within the log_to_metrics filter, and because the chunks are now created and not busy (for some reason), everything works as expected. The first update however is "skipped" (the assumed values would be 2 and then 3, not 1 and 3). @edsiper, do you have any ideas how to proceed? |
@edsiper any update on this issue? we checked fluent-bit 3.1.10 but it still has the same issue |
I am also interested for some updates, facing the same issue. |
In my setup at least, I was able to fix the problem by updating to Fluent Bit 3.2 and setting Flush_Interval_Sec 10. Lower values might work too. |
Ping @edsiper @cosmo0920 Can you please look at this. To me this looks like a general issue, and it currently only occurs at the log_to_metrics plugin, because of the potentially high rate of metric updates. From my understanding this can happen to every other metric plugin, as well, as long as the update rate is high enough. |
Bug Report
Describe the bug
After upgrading from fluent-bit 3.1.4 to 3.1.5 all our k8s clusters start reporting PrometheusDuplicateTimestamps errors
Prometheus metric
rate(prometheus_target_scrapes_sample_duplicate_timestamp_total}[5m]) > 0
is increasing.Prometheus is logging a lot of warnings like this:
To Reproduce
/metrics
endpoint on port 2021Expected behavior
No duplicate metrics on the additional endpoint
/metrics
for log_to_metrics feature usually on port 2021, no warnings in Prometheus logs, no PrometheusDuplicateTimestamps errors.Screenshots
Your Environment
Additional context
It is just very annoying when every k8s cluster with this common configuration reports PrometheusDuplicateTimestamps errors
The text was updated successfully, but these errors were encountered: