Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unbounded cardinality internal metric labels #15426

Open
jszwedko opened this issue Dec 1, 2022 · 7 comments
Open

Remove unbounded cardinality internal metric labels #15426

jszwedko opened this issue Dec 1, 2022 · 7 comments
Labels
domain: observability Anything related to monitoring/observing Vector domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources Epic Larger, user-centric issue that contains multiple sub-issues

Comments

@jszwedko
Copy link
Member

jszwedko commented Dec 1, 2022

Some Vector components publish internal metrics with unbounded tag cardinality. These include:

  • file source is tagging metrics with a dynamic file tag
  • kubernetes_logs is tagging metrics with a dynamic file tag
  • TODO: identify other sources

This was improved in v0.24.0 with the addition of the ability to expire these high cardinality metrics, but I still think we should go further and remove these labels (or let users opt into them) to avoid unexpected cardinality issues in Vector or downstream systems that the internal telemetry is being sent to.

Ref:

@jszwedko jszwedko added domain: observability Anything related to monitoring/observing Vector domain: sources Anything related to the Vector's sources domain: transforms Anything related to Vector's transform components domain: sinks Anything related to the Vector's sinks Epic Larger, user-centric issue that contains multiple sub-issues and removed domain: transforms Anything related to Vector's transform components labels Dec 1, 2022
@pachico
Copy link

pachico commented Mar 24, 2023

I can subscribe to the users affected by this problem.
Internal metrics cardinality is much bigger (x10) than the rest of our metrics, and we run hundreds of thousands of active series.

@dmuth
Copy link

dmuth commented Apr 19, 2023

Subscribing--I wonder if this might be related to #16895 which I filed a few weeks ago--I can make Vector slow down in weird ways if I have a single sink writing to 10,000+ files.

@eposinitskiy
Copy link

We are facing this issue with fluent source, metrics are emitted with tag peer_addr containing port.
In our case, vector aggregator is situated behind the AWS NLB and we see peer_addr always containing different port, apparently NLB doesn't keep connection open. On the fluentd side we have double checked the keepalive, and don't see connection being re-opened frequently.

This eventually results into 300K+ unique TS metrics on the vector side.

@dsmith3197
Copy link
Contributor

Note that peer_addr has been removed in the v0.34.0 release (#15426).

@dsmith3197
Copy link
Contributor

the file tag has also been made opt-in #19084.

the key tag for the throttle transform has been made opt-in #19083.

@dsmith3197
Copy link
Contributor

From our analysis, the above covers all of the internal metric labels with unbounded cardinality.

Please report here if you find others.

@jszwedko jszwedko reopened this Dec 21, 2023
@jszwedko
Copy link
Member Author

Reopening this since we found another potential one: #19447 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: observability Anything related to monitoring/observing Vector domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources Epic Larger, user-centric issue that contains multiple sub-issues
Projects
None yet
Development

No branches or pull requests

5 participants