Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanmetrics generation is not compatible with the Grafana APM dashboard / Tempo metrics generation #1973

Closed
DMarby opened this issue Dec 3, 2022 · 6 comments
Labels
stale Used for stale issues / PRs

Comments

@DMarby
Copy link

DMarby commented Dec 3, 2022

Hi,

I've been attempting to use spanmetrics generated by agent with the new Grafana APM dashboard, and it seems like the generated metrics from agent are not compatible with the ones Tempo generate in a few ways:

  • Latency is in milliseconds, while the metrics generated from Tempo uses seconds
  • The labels are different: service_name instead of service, operation instead of span_name, and additionally Tempo creates status_message

I'm not sure if this should be adressed in the agent or in Tempo, or if it's even meant to be compatible, but the documentation seems to indicate that the APM dashboard should work with spanmetrics generation from the agent:
https://grafana.com/docs/tempo/latest/metrics-generator/app-performance-mgmt/#requirements-to-enable-the-apm-dashboard

It seems like the labels for Tempo was partially changed to address this fairly recently, but they still differ: #1444

@rfratto
Copy link
Member

rfratto commented Dec 5, 2022

cc @mapno do you have any input on the desired behavior here?

@mapno
Copy link
Member

mapno commented Dec 7, 2022

cc @mapno do you have any input on the desired behavior here?

We intend to keep Tempo's implementation as close to OTel as possible, with the objective of both being compatible. I believe the points you raise here are an oversight on our part. Thanks for reporting @DMarby.

@mapno mapno transferred this issue from grafana/agent Jan 5, 2023
@joe-elliott
Copy link
Member

@kovrus I heard that you were working on a solution to this. Anything you can share to help us understand where this is headed?

@devrimdemiroz
Copy link

Same here...

@kovrus
Copy link

kovrus commented Feb 15, 2023

There will be a new OpenTelemetry connector component - spanmetrics, which, I guess, will or can be later integrated into the Grafana Agent Flow. Currently, this component is in the collector contrib code base but not yet enabled, it is essentially a copy of the spanmetrics processor (to be precise it shares its code base). It is a good time to do some changes to improve this component. So there are some issues and open-telemetry/opentelemetry-collector-contrib#18760 related to changes we are planning to do in the new component.

It is worth mentioning that this component should not have any Prometheus-specific naming conventions, etc. since it is a OTel component that receives OTel spans and exports OTel metrics. It can be combined with any kind of exporters or processor later to do all the relevant conversions. For example, I guess, for the case described here we can use spanmetrics connector with the Prometheus remove write exporter that will convert OTel data model to Prometheus data model and export generated metrics.

So coming to the above-mentioned issues:

Latency is in milliseconds, while the metrics generated from Tempo uses seconds

This should be addressed by adding support for the Exponential Histogram and using the seconds unit.

The labels are different: service_name instead of service, operation instead of span_name, and additionally Tempo creates status_message

There are issues created for renaming the operation label to span.name (span_name after it is exported by the prw exporter), dropping _total from the generated counter metric name (_total will be added by the prw exporter), etc. The service.name label is probably there to stay since it will be a part of the generated metrics resource attributes (if metrics exported with the prw exporter, it will be used to create the job Prom label)

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Apr 20, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

No branches or pull requests

6 participants