-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When scraping using Alloy clustering mode, if there are more than 3 replicas, a duplicate label error occurs. #1006
Comments
This problem occurs when using alloy clustering mode and replica 3. |
This may not be related to the issue, I still want to look into it deeper, but I've noticed that you are reaching into an internal metrics path of an Alloy exporter with this:
This is not advised and it's relying on an internal implementation detail. Could you try to use the supported way, similar to the examples in our documentation? To be specific, you shouldn't need to set the
|
I prefer the prometheus pulling method. The guide document provided seems to be a push method, which is directly delivered to mimir from each collection target server. Sorry if I misunderstood. I am pulling from the target server via alloy agent. |
The example that I included in my previous comment uses the pull method. Only targets are passed to BTW, you may also be affected by this issue: #1009 - but there is a simple workaround for it, so try that too :) |
I may have misunderstood. However, I would like to ask one more question. On my multiple collection target EC2 (A Group, B Group), Alloy agent is running in non-cluster mode, On another EKS server, there are two Alloy agent pods (X Group) running in cluster mode. If I want to collect metrics for unix, process, and cadvisor from multiple collection target EC2 (A Group, B Group) from Alloy agent pod (X Group) on the EKS server, don't I have no choice but to declare metric_path? After ec2 discovery, I confirmed that only /metrics is read when prometheus.scrape is performed. /metrics only has metrics for alloy, and unix, process, and cadvisor metrics do not exist. That's why I declared unix, process, and cadvisor separately in addition to /metrics. I understand that the method you mentioned is only possible when the exporter and scrape are on the same server and are the same Alloy agent process. // Configure a prometheus.scrape component to collect process_exporter metrics. If I'm wrong or you have different design guidelines, please let me know and I'd really appreciate it. |
@thampiotr I applied this and I am not getting any errors on replica 3. However, I do not understand why the instance label affects clustering flow: ec2 discovery -> relabel -> scrape Shouldn't the calculation be the same if each collection target has the same instance label in alloy 3 pod in cluster mode? Lastly, as you commented above, is there any other way to collect metrics from the collection target other than using the metric path as I provided, since the collection target (non cluster mode) and the collector (cluster mode) are installed separately on different ec2s? |
Thanks for closing this, I'm happy it worked eventually!
I have described this failure mode in more detail in this issue. The instance label would be different between instances and thus the hashing will be different too, breaking an important assumption in clustering. For anyone encountering this or similar problem in the future - check with this issue for a workaround and potential fix in the future: #1009 |
What's wrong?
I am using AWS environment and Alloy agent is exporting metrics through unix, process, advisor exporter for each of about 100 servers.
The Alloy v1.1.0 agent in kubernetes performs ec2 discovery -> relabel (ec2 tag save) -> scrape -> mimir write.
At this time, the Alloy agent deployed as statefulset in kubernetes collects well without error in replica count 2.
However, when replica count becomes 3, duplicate label error starts to occur in the last replica pod.
The metrics seem to be collected well, but this error comes up countless times and feels like a fatal problem to me.
Reference Link
#784
Steps to reproduce
alloy
,mimir-distributed
helm chart statefulset deploySystem information
Linux 6.1.84-99.169.amzn2023.x86_64
Software version
Alloy v1.1.0
Configuration
Logs
The text was updated successfully, but these errors were encountered: