-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporter for OTel Metrics #691
Comments
@AlexanderWert (or anyone) naive question -- is the intention here that a user will add the OpenTelemetry metrics API/SDK to their project, use that to generate metrics, and then
|
My understanding is that if our agent is installed, it will intercept the OTel metrics and send them using the traditional endpoint - but you raise a great question - what stops us from just configuring the OTel SDK to use the OTLP endpoint? Seems like that's a valid solution and much quicker to implement? |
A couple of the benefits of using the intake/v2 protocol:
A single connection could also be achieved by using OTLP/HTTP. To get consistent metadata while using OTLP, we would need to implement https://github.com/elastic/apm-dev/issues/769#issuecomment-1226839134 |
Thanks for the insight both @axw, @jackshirazi. Follow up question for anyone -- OTel has six metric types (Counter, Asynchronous Counter, Histogram, Asynchronous Gauge, UpDownCounter, Asynchronous UpDownCounter) Has anyone done the work yet to map how these data types would be represented in the |
I assume that work is effectively encoded in APM Server's support for OTLP incoming metrics, but I haven't looked. Here is the limited (only tried with a gauge so far) code from my past OnWeek doing this for the node.js APM agent: https://github.com/elastic/apm-agent-nodejs/blob/trentm/onweek3-rearch-metrics/lib/metrics2/index.js#L87-L91 Basically I'm assuming/hoping:
I haven't sanity checked any of that though. I lingering concern/TODO that I had from my OnWeek trial usage of the OTel JS Metrics SDK was what, if any, implications there are with temporality and resets and gaps from the OTel Metrics data model. I'm very naive here. |
I think this is fine for now, but we should eventually send the type (counter/gauge) too.
There are two types of histograms: plain old histogram and exponential histogram. The more I think about this, the more I think that sending the metrics as OTLP/HTTP and having the server combine it with metadata would be the way to go. That may also not be that simple for agents though, as it would involve changing the way requests are sent (as multipart content, with one part being metadata and one part being the OTLP protobuf). |
Would it be possible (and maybe easier for agents) to "just" enrich the metrics with metadata on the agent-side and then reuse the OTLP reporter (or some modified version of it)? Though it would not solve the problem of having an additional connection |
It would be possible. That would involve translating our metadata to OTel resource attributes. Maybe it's not too bad?
I suppose this bit is language-dependent. For Go I expect we can pass in a |
Excellent question, Alan, and sorry for jumping in late. But now is definitely the right time to talk about the different options and their trade-offs. Initially, I was thinking that we'd convert the metrics to the intake v2 format. I'm sure that there are some missing pieces and that we'll need to extend the schema to be able to capture all types of metrics. Let me try to summarize the pros and cons Send OTel metrics via Intake v2
Send OTel metrics via OTLP
I'm still leaning towards send OTel metrics via intake v2 but there are lots of unknowns on both sides. |
@axw Is there code in apm-server that is doing the reverse of this (translating OTel resource attributes into our metadata) to support OTLP intake? I'm starting to look at the Node.js agent code for this PoC and would be interested in cribbing from that code if it exists. |
As an alternative to passing metadata via a multipart message, what about passing the whole metadata JSON as a single Resource Attribute -- named (Update: We'd need to modify APM server's OTLP resource attribute code to handle that "special" attribute, of course.) I'm able to do this with the OTel JS metrics exporter code easily. Here is a pretty-printed protobuf example (sent from the Node.js agent) showing this working: https://gist.github.com/trentm/c93951b4c163b49a1776584adc5ab3c3#file-metrics-data-out-L122-L127 OpenTelemetry does describe configurable limits on attributes
However, resource attributes are exempt:
What do others think about this way of getting metadata from APM agents to APM server via OTLP? |
[AlexW]
[Andrew]
I investigated this for the Node.js agent here: elastic/apm-agent-nodejs#2954 (comment) And then after the fact I realized that Andrew had already pointed this out: "but there may still be multiple connections due to the long-lived nature of our streaming requests". |
@trentm nice idea. Sending the metadata as a resource attribute is certainly a lot simpler than what I had in mind. Regarding multiple connections: I think I missed some words before, and that should only apply to HTTP/1.1. In HTTP/2, the requests would be multiplexed as multiple streams. Would it be feasible for the Node.js agent to use the http2 module's Compatibility API for making requests to APM Server? If so, then for TLS connections to APM Server (e.g. in Elastic Cloud), the agent should be able to negotiate HTTP/2 and minimise the number of connections. |
My understanding is that that Compatibility API is about the server-side -- i.e. supporting creation of an HTTP server in node that can handle incoming HTTP/1.1 and HTTP/2 requests. But, yes, I can look into getting the Node.js agent to use HTTP/2 for its requests. Going this route will potentially be a lot more work:
Showing my HTTP/2 newb-ness, TIL about "GOAWAY" frames. |
Here is a bit of a status update. I haven't looked at this for a week and a half. The "easy" path so far, from my investigation, is as follows. This requires very little work on the Node.js APM agent side:
Here are the open questions I intended to look into. Some of these are language-specific, some not. To use the intake-v2 API for sending OTel Metrics:
To use one of the OTLP flavours:
@JonasKunz I understand that you are starting to look at this PoC for the Java agent as well. Let me know if the above makes sense and/or if there is anything we could work on together. |
@axw I started looking into this. Correct me if I'm wrong: APM server itself does not support HTTP/2, but the cloud proxy does? Or at least APM server does not when I'm accessing it via
Local APM server (running via Tilt) failing on an attempted HTTP/2 request:
An APM server in cloud supporting HTTP/2 via ALPN negotiation (use
|
I found https://github.com/elastic/apm-server/blob/main/dev_docs/otel.md#muxing-grpc-and-http11 and I suspect I'm hitting this:
I was attempting to use h2c for non-gRPC. |
Crazy idea: what about going gRPC for intake-v2 data? |
@trentm as you've found, we do support HTTP/2 but it more or less requires TLS.
I think that's fair to say.
It can't assume HTTP/2 support, e.g. because there could be a reverse proxy. ALPN is the expected way of dealing with this, but I can't comment on Node.js support.
Good point, I hadn't thought about the Lambda extension. It doesn't currently support OTLP at all. That's another con for the OTLP approach/pro for intake-v2.
@graphaelli looked into this a couple of years ago, and I think @marclop may have looked at it recently too. IIRC, one issue we found is that protobuf is slower to encode in Node.js than JSON, by virtue of the runtime having native JSON encoding support. I don't know if that has changed. If not, it's another con for the OTLP approach for metrics -- but probably not such a big deal if limited to metrics, which would not be high throughput. |
That's probably still true. I have a note to do some CPU load comparisons. |
I might be a little late to the party, but I started investigating things from the Java side yesterday. I'm trying the "easy path" as a starter as well: Use the Otel metrics SDK + Otlp exporter. Dependency Sizes
Http/2 communication
So to summarize, it seems very hard to get both exporters running on the same TCP connection via Http/2. I was therefore thinking of the same middleground between converting the data to IntakeV2 and sending the data to the APM server's OLTP endpoint:
I was thinking of just sending the OTLP protobuf messages via the IntakeV2 API (via something like a
I stumbled across the fact that there is a JSON protobuf encoding, though it is experimental and I haven't looked deeper into it yet. Is this maybe an option? |
Yah, that is a possible option. I can access that code in the OTel Metrics SDK without cheating. It would perhaps be sketchy to rely on stability of this JSON encoding until it is "stable". The proposal here, then, might be:
This would mean:
Update: For users, it means they need to have an updated agent and APM server and the agent cannot send those "otelresourcemetrics" until it has successfully sniffed that the APM server version is new enough. If the APM agent converted to "metricsets" and used intake-v2, then the user just needs an updated agent version -- which is slightly nicer. |
Encoding OTel metrics inside intake-v2 is an interesting idea. Given that the JSON encoding is experimental. I'm a bit leery of depending on that. Base64-encoded protobuf feels like it could be bit of a pain for debugging, but technically fine, and should be more stable. |
With this change the APM agent adds support for using OTel Metrics in two ways: 1. The `@opentelemetry/api` package is instrumented so that usage of `.metrics.getMeterProvider(...)` will use one provided by the APM agent, if the user hasn't explicitly provided one of their own. This implementation uses a version of the OTel Metrics SDK included with the APM agent. It is configured to sent metrics to APM server by default. This allows a user to use the OTel Metrics API and get metrics set to APM server without any configuration. 2. Alternatively, the `@opentelemetry/sdk-metrics` package is instrumented so that when a user creates a `MeterProvider`, the APM agent will added a `MetricReader` that sends metrics to APM server. This allows a user to configure metrics as they wish using the OTel Metrics SDK and then automatically get those metrics sent to APM server. This also adds some grouping in ".ci/tav.json" for the TAV workflow to avoid the 256 jobs GH Actions limit. I'm not yet sure if those will work well. Closes: #2954 Refs: elastic/apm#691 Refs: elastic/apm#742 (spec PR)
With this change the APM agent adds support for using OTel Metrics in two ways: 1. The `@opentelemetry/api` package is instrumented so that usage of `.metrics.getMeterProvider(...)` will use one provided by the APM agent, if the user hasn't explicitly provided one of their own. This implementation uses a version of the OTel Metrics SDK included with the APM agent. It is configured to sent metrics to APM server by default. This allows a user to use the OTel Metrics API and get metrics set to APM server without any configuration. 2. Alternatively, the `@opentelemetry/sdk-metrics` package is instrumented so that when a user creates a `MeterProvider`, the APM agent will added a `MetricReader` that sends metrics to APM server. This allows a user to configure metrics as they wish using the OTel Metrics SDK and then automatically get those metrics sent to APM server. This also adds some grouping in ".ci/tav.json" for the TAV workflow to avoid the 256 jobs GH Actions limit. I'm not yet sure if those will work well. Closes: #2954 Refs: elastic/apm#691 Refs: elastic/apm#742 (spec PR)
With this change the APM agent adds support for using OTel Metrics in two ways: 1. The `@opentelemetry/api` package is instrumented so that usage of `.metrics.getMeterProvider(...)` will use one provided by the APM agent, if the user hasn't explicitly provided one of their own. This implementation uses a version of the OTel Metrics SDK included with the APM agent. It is configured to sent metrics to APM server by default. This allows a user to use the OTel Metrics API and get metrics set to APM server without any configuration. 2. Alternatively, the `@opentelemetry/sdk-metrics` package is instrumented so that when a user creates a `MeterProvider`, the APM agent will added a `MetricReader` that sends metrics to APM server. This allows a user to configure metrics as they wish using the OTel Metrics SDK and then automatically get those metrics sent to APM server. This also adds some grouping in ".ci/tav.json" for the TAV workflow to avoid the 256 jobs GH Actions limit. I'm not yet sure if those will work well. Closes: elastic#2954 Refs: elastic/apm#691 Refs: elastic/apm#742 (spec PR)
Description
Agents provide a metrics reporter that sends the metrics from metric registry provided by the OTel metrics SDK to APM Server.
The specifics on how the implementation looks like may be agent specific but the goal is that users have as little to configure as possible.
Unlike our OTel tracing bridge, metrics will not be implemented as a bridge that translates the OTel API calls into calls to the internal metrics registry. Instead, we rely on the OTel metrics SDK to provide the implementation for the OTel metrics API. Agents will only provide a custom reporter that may be registered automatically (for languages that allow for instrumentation) or programmatically.
Spec Issue
Agent Issues
The text was updated successfully, but these errors were encountered: