Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Prometheus releases reports duplicate metrics in the rundeck exporter #108

Closed
BaCaRoZzo opened this issue Jan 9, 2025 · 10 comments
Closed

Comments

@BaCaRoZzo
Copy link

BaCaRoZzo commented Jan 9, 2025

See prometheus/prometheus#14089 for another example of this issue.

Newer releases of prometheus are generating error logs regarding duplicate metrics in the exporter:

ts=2025-01-09T15:01:16.644Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
ts=2025-01-09T15:06:15.441Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
ts=2025-01-09T15:21:14.829Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=14
ts=2025-01-09T15:31:14.470Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=14
ts=2025-01-09T15:36:13.986Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
ts=2025-01-09T15:46:14.199Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
ts=2025-01-09T15:51:14.408Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=14
ts=2025-01-09T16:16:14.381Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
ts=2025-01-09T16:21:13.518Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=rundeck target=http://rundeck/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7

Using the piped command from prometheus/prometheus#14089 (comment) in the linked issue, we can indeed find out that some metrics repeat themselves. The metrics that repeat for me are:

rundeck_project_execution_duration_seconds
rundeck_project_execution_status
rundeck_project_start_timestamp

They either repeat with the same value or different values. The duplication is not continuous but instead it happens for a few scrapes and then disappears. In the following picture you can see how often it happened in the last hour via the prometheus_target_scrapes_sample_duplicate_timestamp_total metric:

image

We are currently using release 2.7.0, although I don't see any change in the newer releases of the exporter that can help with this problem. Since we are alerting on prometheus_target_scrapes_sample_duplicate_timestamp_total, it is causing quite some noise in our notification channels.

Any idea about what can be causing the issue?

@phsmith
Copy link
Owner

phsmith commented Jan 9, 2025

Hi @BaCaRoZzo, thanks for reporting this.

OK, I've just tried in my local environment and it looks like this only happens when the RUNDECK_PROJECTS_EXECUTIONS_CACHE or --rundeck.projects.executions.cache option is passed to the exporter. It looks like this behavior was hidden in Prometheus versions lower than 2.5.2.

I need to look into it further.
Will keep you posted.

@phsmith
Copy link
Owner

phsmith commented Jan 11, 2025

@BaCaRoZzo, can you confirm your Prometheus version?

I ask because after two days of testing, enabling, disabling, and playing with the exporter cache and cache TTL, I've only been able to reproduce the duplicate metrics twice, I've also upgraded my Prometheus to version 3.1.0 and set it to scrape the metrics every 15s.

Could you try the latest Prometheus version?

@BaCaRoZzo
Copy link
Author

Hi @phsmith,

I'm using the latest 2.55.1 because we basically planned to bump to latest 2.x series to simplify the porting to 3.x.

There's no plan to port soon but I can give it a try and see if that solves the issue.

Which was the version that showed the problem for you?

@phsmith
Copy link
Owner

phsmith commented Jan 11, 2025

Got it! I was running Prometheus 2.52.0 when I got the error. Let me try with the same version as you.

By the way, I've noticed that the duplication can be eliminated by having a label in the metrics that can be updated each time the exporter metrics are scraped, I'm just checking out the better way to do this.

@phsmith
Copy link
Owner

phsmith commented Jan 13, 2025

I was able to confirm that Prometheus v2.55.1 (left) has the problem, but v3.1.0 doesn't (right):

@BaCaRoZzo
Copy link
Author

@phsmith so is this a problem with prometheus itself, with the exporter or both? Looking at the issue I've linked, the problem seemed to be on the exporter side. However, your message hints at a problem on the prometheus side. 🤔

As said, we don't plan to update to 3.x soonish, we are focusing on other tasks at the moment which are more pressing. I'll definitely update in a few months, but I can't really point to a date.

So, if a fix for 2.x is coming - assuming that makes sense from your POV - it would be really appreciated.

@phsmith
Copy link
Owner

phsmith commented Jan 13, 2025

Yes, I have found the problem on the Prometheus side regarding duplicate metrics in version 2.x, but it is mainly due to the exporter cache option which sends the same metrics until the cache is invalidated.

I've found a way to fix this on the exporter side and will send a fix tonight.

@BaCaRoZzo
Copy link
Author

@phsmith thanks a lot.

phsmith added a commit that referenced this issue Jan 13, 2025
* chore: update requirements.txt

* docs: update CHANGELOG.md

* fix(#108): add timestamp to project_executions metrics
@phsmith
Copy link
Owner

phsmith commented Jan 13, 2025

@BaCaRoZzo, I've just released the exporter version v2.8.4 with the fix for this issue.

Please give it a try when you have the chance.

@BaCaRoZzo
Copy link
Author

@phsmith apologies for the long wait.

The fix seems to be effective. I've deployed the new exporter since a few hours and I cannot see the alert spawning. On the basis of that I think we can close this issue.

Thanks so much for your prompt response and the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants