Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reporter for OTel Metrics #3152

Merged
merged 58 commits into from
Apr 28, 2023
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
289722f
feat: a proof-of-concept providing an OTel global meter provider usin…
trentm Nov 22, 2022
eb87d72
first otel metrics provider impl (I've only sanity checked Counter us…
trentm Feb 6, 2023
5af565d
Merge branch 'main' into trentm/otel-metrics-poc
trentm Feb 6, 2023
3f5fbf0
cleaning up use case names
trentm Feb 7, 2023
10c9784
drop the PoC dirs of work
trentm Feb 7, 2023
df7b887
initial test case for otel metrics (just the API usage use case, and …
trentm Feb 7, 2023
e8fe3d0
testing tweaks, don't start metrics or register the OTel MeterProvide…
trentm Feb 7, 2023
ba8bc52
rejigger to no longer use otel.metrics.setGlobalMeterProvider()
trentm Feb 8, 2023
f7dcd8b
dump stdout/stderr if this test fails (e.g. on Windows in CI) to try …
trentm Feb 8, 2023
2152d14
fix 'make check'
trentm Feb 8, 2023
16db977
flail debug output to try to grok Windows test failure in Jenkins CI
trentm Feb 9, 2023
cf749b0
test: skip an expectation from exit value of subprocess we are SIGTER…
trentm Feb 9, 2023
e3f2edd
get both use cases from the spec working: instrumenting otel/api or o…
trentm Feb 9, 2023
73aaa4e
fix 'make check'
trentm Feb 9, 2023
3ad752a
lots of improvements to the examples
trentm Feb 10, 2023
6df92f9
test: use undici.request because .fetch is only in Node 16.8+
trentm Feb 10, 2023
f4ba0bb
oh 'make check', I sacrifice these lines for thee
trentm Feb 10, 2023
5955d74
use Delta (for some instrument types) aggregation temporality per our…
trentm Feb 10, 2023
0d98be9
test for aggregation temporality for 'counter'
trentm Feb 10, 2023
0a70604
add observable counter meter type
trentm Mar 7, 2023
ba79d6a
Merge branch 'main' into trentm/otel-metrics-poc
trentm Mar 7, 2023
5a32ed0
use the 'Async ...' instrument naming, as used in the lang-generic OT…
trentm Mar 7, 2023
6025c8c
restore code commented out for dev, fixes 'make check'
trentm Mar 7, 2023
8d5d9f0
add Async Gauge metric type
trentm Mar 8, 2023
4edabbb
updowncounter metric type
trentm Mar 8, 2023
07894bb
turn off dev/debugging ignoring metrics, they are needed for activati…
trentm Mar 8, 2023
015c263
observable UpDownCounter support (modulo possible otel js metrics sdk…
trentm Mar 9, 2023
39a4d25
histograms
trentm Mar 10, 2023
1a1cac2
test handling of attr-sets for separate metricsets
trentm Mar 11, 2023
1d60bc5
labels: drop array-valued attrs and warn; cleaning out XXXs
trentm Mar 28, 2023
cd29e56
agent.flush() on forceFlush or shutdown
trentm Mar 29, 2023
80e082c
an example using the OTel NodeSDK for metrics setup and shutdown
trentm Mar 29, 2023
2651fe0
Merge branch 'main' into trentm/otel-metrics-poc
trentm Apr 3, 2023
05a2320
fix lint; bump to latest otel deps
trentm Apr 3, 2023
a4b09be
sdk-metrics ver guard (v1.11.0); separate metricsets per instrumentat…
trentm Apr 4, 2023
3cb2700
forgot to include this file in previous commit
trentm Apr 4, 2023
b28b182
add the disableMetrics config var and impl it for OTel Metrics and th…
trentm Apr 4, 2023
ecf5ea6
fix lint
trentm Apr 4, 2023
8a5ebb5
test: update mock to have required new method for 'disableMetrics' work
trentm Apr 4, 2023
fab0bad
quick play with Exponential Histograms; warn when we drop a metric of…
trentm Apr 4, 2023
94e141d
drop exporting OTel Metric 'unit'; not in spec, not sure how the valu…
trentm Apr 4, 2023
6eb79f7
otel metrics docs
trentm Apr 7, 2023
fcd7ffb
some tweaks on the docs
trentm Apr 13, 2023
9c8d6a9
doc updates; document customMetricsHistogramBoundaries config var
trentm Apr 13, 2023
9337153
test: add and improve TAV testing of @opentelemetry/api and sdk-metrics
trentm Apr 16, 2023
4e6055d
Merge branch 'main' into trentm/otel-metrics-poc
trentm Apr 17, 2023
56a82c3
update to latest sdk-metrics (v1.12.0); fix 'npm run test:deps' for r…
trentm Apr 17, 2023
9d16af2
fix tests (need lazy 'npm install' now with separate package); add de…
trentm Apr 17, 2023
74c5563
clean up otel metrics examples area
trentm Apr 17, 2023
1cfc4d5
drop the XXX
trentm Apr 17, 2023
1d501d0
flail debugging CI windows failure
trentm Apr 17, 2023
ffd8828
Revert "flail debugging CI windows failure"
trentm Apr 17, 2023
e630cc5
another flail to attempt to avoid spurious test failures on windows
trentm Apr 17, 2023
3feae2b
drop unneeded config object guard (from review feedback)
trentm Apr 27, 2023
0062d04
cleaning up final todos now that the spec is merged
trentm Apr 27, 2023
a474d69
doc, comment, log message improvements/corrections with final review
trentm Apr 28, 2023
751b2e7
Merge branch 'main' into trentm/otel-metrics-poc
trentm Apr 28, 2023
1eb8ba1
changelog entry
trentm Apr 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 5 additions & 8 deletions .ci/tav.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@
"@elastic/elasticsearch",
"@elastic/elasticsearch-canary",
"@hapi/hapi",
"@koa/router",
"@opentelemetry/api",
"@opentelemetry/sdk-metrics",
"apollo-server-express",
"aws-sdk",
"bluebird",
"cassandra-driver",
"elasticsearch",
"express",
Expand All @@ -17,25 +16,23 @@
"fastify",
"finalhandler",
"generic-pool",
"got",
"graphql",
"handlebars",
"ioredis",
"knex",
"koa-router",
"memcached",
"mimic-response",
"mongodb",
"mongodb-core",
"mysql",
"mysql2",
"next",
"pg",
"pug",
"redis",
"restify",
"tedious",
"undici",
"ws"
"ws",
"@koa/router,koa-router",
"handlebars,pug",
"bluebird,got,mimic-response"
]
}
3 changes: 2 additions & 1 deletion .eslintrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"/.nyc_output",
"/build",
"node_modules",
"elastic-apm-node.js",
"/examples/esbuild/dist",
"/examples/typescript/dist",
"/examples/nextjs",
Expand All @@ -30,6 +31,6 @@
"/test/types/transpile/index.js",
"/test/types/transpile-default/index.js",
"/test_output",
"/tmp"
"tmp"
]
}
16 changes: 16 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,22 @@ updates:
reviewers:
- "elastic/apm-agent-node-js"

- package-ecosystem: "npm"
directory: "/test/opentelemetry-bridge"
schedule:
interval: "weekly"
open-pull-requests-limit: 5
reviewers:
- "elastic/apm-agent-node-js"

- package-ecosystem: "npm"
directory: "/test/opentelemetry-metrics/fixtures"
schedule:
interval: "weekly"
open-pull-requests-limit: 5
reviewers:
- "elastic/apm-agent-node-js"

- package-ecosystem: "npm"
directory: "/examples/opentelemetry-bridge"
schedule:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/tav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ jobs:
max-parallel: 30
fail-fast: false
matrix:
# A job matrix limit is 256. We do some grouping of TAV modules to
# stay under that limit.
# https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
node: ${{ fromJSON(needs.prepare-matrix.outputs.versions) }}
module: ${{ fromJSON(needs.prepare-matrix.outputs.modules) }}
steps:
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@
/build
node_modules
/test/benchmarks/.tmp
/tmp
tmp
/examples/*/dist
.next
10 changes: 0 additions & 10 deletions .tav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -529,13 +529,3 @@ undici:
commands: node test/instrumentation/modules/undici/undici.test.js
node: '>=12.18'

"@opentelemetry/api":
versions: '>=1.0.0 <1.5.0'
node: '>=8.0.0'
commands:
- node test/opentelemetry-bridge/OTelBridgeNonRecordingSpan.test.js
- node test/opentelemetry-bridge/OTelBridgeRunContext.test.js
- node test/opentelemetry-bridge/active-span-and-context-interop.test.js
- node test/opentelemetry-bridge/fixtures.test.js
- node test/opentelemetry-bridge/interface-ContextManager.test.js
- node test/opentelemetry-bridge/otel-bridge-feature.test.js
217 changes: 164 additions & 53 deletions docs/api-opentelemetry.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,26 @@ endif::[]
[[opentelemetry-bridge]]
== OpenTelemetry bridge

NOTE: Added as experimental in v3.34.0.
To enable it, set <<opentelemetry-bridge-enabled, `opentelemetryBridgeEnabled`>> to `true`.
NOTE: Integration with the OpenTelemetry Tracing API was added as experimental in v3.34.0.
Integration with the OpenTelemetry Metrics API was added as experimental in v3.45.0.

The Elastic APM OpenTelemetry bridge allows one to use the vendor-neutral
https://opentelemetry.io/docs/instrumentation/js/api/[OpenTelemetry Tracing API]
(https://www.npmjs.com/package/@opentelemetry/api[`@opentelemetry/api`]) to
manually instrument your code, and have the Elastic Node.js APM agent handle
those API calls. This allows one to use the Elastic APM agent for tracing,
without any vendor lock-in from adding manual tracing using the APM agent's own
<<api,public API>>.
https://opentelemetry.io/docs/instrumentation/js/[OpenTelemetry API]
(https://www.npmjs.com/package/@opentelemetry/api[`@opentelemetry/api`]) in
your code, and have the Elastic Node.js APM agent handle those API calls.
This allows one to use the Elastic APM agent for tracing and metrics without any
vendor lock-in to APM agent's own <<api,public API>> with adding manual tracing
or custom metrics.


[float]
[[otel-getting-started]]
=== Getting started
[[otel-tracing-api]]
=== Using the OpenTelemetry Tracing API

The goal of the OpenTelemetry bridge is to allow using the OpenTelemetry API
with the APM agent. ① First, you will need to add those dependencies to your
project. The minimum required OpenTelemetry API version is 1.0.0; see
<<compatibility-opentelemetry,the OpenTelemetry compatibility section>> for the
current maximum supported API version. For example:
① First, you will need to add the Elastic APM agent and OpenTelemetry API
dependencies to your project. The minimum required OpenTelemetry API version is
1.0.0; see <<compatibility-opentelemetry,the OpenTelemetry compatibility section>>
for the current maximum supported API version. For example:

[source,bash]
----
Expand All @@ -41,15 +40,14 @@ your application code):
----
export ELASTIC_APM_SERVER_URL='<url of your APM server>'
export ELASTIC_APM_SECRET_TOKEN='<secret token for your APM server>' # or ELASTIC_APM_API_KEY=...
export ELASTIC_APM_OPENTELEMETRY_BRIDGE_ENABLED=true
export ELASTIC_APM_OPENTELEMETRY_BRIDGE_ENABLED=true <1>
export NODE_OPTIONS='-r elastic-apm-node/start.js' # Tell node to preload and start the APM agent
node my-app.js
----
<1> Future versions may drop this config var and enable usage of the tracing API by default.

Or, alternatively, you can configure and start the APM agent at the top of your
application code as follows. (Note: For automatic instrumentations to function
properly, this must be executed before other `require` statements and
application code.)
application code:

[source,js]
----
Expand All @@ -62,11 +60,11 @@ require('elastic-apm-node').start({
// Application code ...
----

NOTE: These examples show the minimal configuration. See <<configuration,the full APM agent configuration reference>> for other configuration options.
See <<configuration,the full APM agent configuration reference>> for other configuration options.

③ Finally, you can use the OpenTelemetry API for any manual tracing in your code.
For example, the following script uses
https://open-telemetry.github.io/opentelemetry-js-api/interfaces/tracer.html#startactivespan[Tracer#startActiveSpan()]
③ Finally, you can use the https://open-telemetry.github.io/opentelemetry-js/modules/_opentelemetry_api.html[OpenTelemetry API]
for any manual tracing in your code. For example, the following script uses
https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_api.Tracer.html#startActiveSpan[Tracer#startActiveSpan()]
to trace an outgoing HTTPS request:

[source,js]
Expand All @@ -89,62 +87,175 @@ tracer.startActiveSpan('makeRequest', span => {
----

The APM agent source code repository includes
https://github.com/elastic/apm-agent-nodejs/tree/main/examples/opentelemetry-bridge[some examples using the OpenTelemetry bridge].
https://github.com/elastic/apm-agent-nodejs/tree/main/examples/opentelemetry-bridge[some examples using the OpenTelemetry tracing bridge].


[float]
[[otel-metrics-api]]
=== Using the OpenTelemetry Metrics API

① As above, install the needed dependencies. The minimum required OpenTelemetry
API version is 1.3.0 when metrics were added; see <<compatibility-opentelemetry,the OpenTelemetry compatibility section>>
for the current maximum supported API version. For example:

[source,bash]
----
npm install --save elastic-apm-node @opentelemetry/api
----

② Configure and start the APM agent. This can be done completely with
environment variables -- as shown below -- or in code. (See <<starting-the-agent>>
and <<configuration,the full APM agent configuration reference>> for other
configuration options.)

[source,bash]
----
export ELASTIC_APM_SERVER_URL='<url of your APM server>'
export ELASTIC_APM_SECRET_TOKEN='<secret token for your APM server>' # or ELASTIC_APM_API_KEY=...
export NODE_OPTIONS='-r elastic-apm-node/start.js' # Tell node to preload and start the APM agent
node my-app.js
----

③ Finally, you can use the OpenTelemetry Metrics API, to
https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_api.Meter.html[create metrics]
and the APM agent will periodically ship those metrics to your Elastic APM
deployment where you can visualize them in Kibana.

[source,js]
----
// otel-metrics-hello-world.js <1>
const { createServer } = require('http')
const otel = require('@opentelemetry/api')

const meter = otel.metrics.getMeter('my-meter')
const numReqs = meter.createCounter('num_requests', { description: 'number of HTTP requests' })

const server = createServer((req, res) => {
numReqs.add(1)
req.resume()
req.on('end', () => {
res.end('pong\n')
})
})
server.listen(3000, () => {
console.log('listening at http://127.0.0.1:3000/')
})
----
<1> The full example is https://github.com/elastic/apm-agent-nodejs/blob/main/examples/opentelemetry-metrics/otel-metrics-hello-world.js[here].


[float]
[[otel-metrics-sdk]]
==== Using the OpenTelemetry Metrics SDK

The Elastic APM agent also supports exporting metrics to APM server when the
OpenTelemetry Metrics *SDK* is being used directly. You might want to use
the OpenTelemetry Metrics SDK to use a https://opentelemetry.io/docs/reference/specification/metrics/sdk/#view[`View`]
to configure histogram bucket sizes, to setup a Prometheus exporter, or for
other reasons. For example:

[source,js]
----
// use-otel-metrics-sdk.js <1>
const otel = require('@opentelemetry/api')
const { MeterProvider } = require('@opentelemetry/sdk-metrics')
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus')

const exporter = new PrometheusExporter({ host: '127.0.0.1', port: 3001 })
const meterProvider = new MeterProvider()
meterProvider.addMetricReader(exporter)
otel.metrics.setGlobalMeterProvider(meterProvider)

const meter = otel.metrics.getMeter('my-meter')
const latency = meter.createHistogram('latency', { description: 'Response latency (s)' })
// ...
----
<1> The full example is https://github.com/elastic/apm-agent-nodejs/blob/main/examples/opentelemetry-metrics/use-otel-metrics-sdk.js[here].


[float]
[[otel-metrics-conf]]
==== OpenTelemetry Metrics configuration

A few configuration options can be used to control OpenTelemetry Metrics support.

- Specific metrics names can be filtered out via the <<disable-metrics>> configuration option.
- Integration with the OpenTelemetry Metrics API can be disabled via the <<disable-instrumentations,`disableInstrumentations: '@opentelemetry/api'`>> configuration option.
- Integration with the OpenTelemetry Metrics SDK can be disabled via the <<disable-instrumentations,`disableInstrumentations: '@opentelemetry/sdk-metrics'`>> configuration option.
- All metrics support in the APM agent can be disabled via the <<metrics-interval,`metricsInterval: '0s'`>> configuration option.


[float]
[[otel-architecture]]
=== Bridge architecture

The OpenTelemetry bridge works similarly to the
https://github.com/open-telemetry/opentelemetry-js[OpenTelemetry JS SDK]. It
registers Tracer and ContextManager providers with the OpenTelemetry API.
Subsequent `@opentelemetry/api` calls in user code will call into those
providers. The APM agent translates from OpenTelemetry to Elastic APM semantics
and sends tracing data to your APM server for full support in
The OpenTelemetry Tracing bridge works similarly to the
https://github.com/open-telemetry/opentelemetry-js/tree/main/packages/opentelemetry-sdk-trace-node/[OpenTelemetry Node.js Trace SDK].
It registers Tracer and ContextManager providers with the OpenTelemetry API.
Subsequent `@opentelemetry/api` calls in user code will use those providers.
The APM agent translates from OpenTelemetry to Elastic APM semantics and sends
tracing data to your APM server for full support in
https://www.elastic.co/apm[Elastic Observability's APM app].

Here are a couple examples of semantic translations: The first entry span of a
Some examples of semantic translations: The first entry span of a
service (e.g. an incoming HTTP request) will be converted to an
{apm-guide-ref}/data-model-transactions.html[Elasic APM `Transaction`],
subsequent spans are mapped to
{apm-guide-ref}/data-model-spans.html[Elastic APM `Span`]. OpenTelemetry Span
{apm-guide-ref}/data-model-spans.html[Elastic APM `Span`s]. OpenTelemetry Span
attributes are translated into the appropriate fields in Elastic APM's data
model.

The only difference, from the user's point of view, is in the setup of tracing.
Instead of setting up the OpenTelemetry JS SDK, one sets up the APM agent
as <<otel-getting-started,described above>>.
as <<otel-tracing-api,described above>>.

---

The OpenTelemetry Metrics support, is slightly different. If your code uses
just the Metrics *API*, then the APM agent provides a full MeterProvider so
that metrics are accumulated and sent to APM server. If your code uses the
Metrics *SDK*, then the APM agents adds a MetricReader to your MeterProvider
to send metrics on to APM server. This allows you to use the APM agent as
either an easy setup for using metrics or in conjunction with your existing
OpenTelemetry Metrics configuration.

[float]
[[otel-caveats]]
=== Caveats
Not all features of the OpenTelemetry API are supported.

[float]
[[otel-metrics]]
===== Metrics
This bridge only supports the tracing API.
The Metrics API is currently not supported.
Not all features of the OpenTelemetry API are supported. This section describes
any limitations and differences.

[float]
[[otel-span-links]]
===== Span Link Attributes
[[otel-caveats-tracing]]
===== Tracing

Adding links when
https://open-telemetry.github.io/opentelemetry-js-api/interfaces/tracer.html[starting a span]
*is* currently supported, but any span link *attributes are silently dropped*.
- Span Link Attributes. Adding links when https://open-telemetry.github.io/opentelemetry-js/interfaces/\_opentelemetry_api.Tracer.html[starting a span] is supported, but any added span link *attributes* are silently dropped.
- Span events (https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_api.Span.html#addEvent[`Span#addEvent()`]) are not currently supported. Events will be silently dropped.
- https://open-telemetry.github.io/opentelemetry-js/classes/_opentelemetry_api.PropagationAPI.html[Propagating baggage] within or outside the process is not supported. Baggage items are silently dropped.

[float]
[[otel-span-events]]
===== Span Events
Span events (https://open-telemetry.github.io/opentelemetry-js-api/interfaces/span.html#addevent[`Span#addEvent()`])
is not currently supported. Events will be silently dropped.
[[otel-caveats-metrics]]
===== Metrics

- Metrics https://opentelemetry.io/docs/reference/specification/metrics/data-model/#exemplars[exemplars] are not supported.
- https://opentelemetry.io/docs/reference/specification/metrics/data-model/#summary-legacy[Summary metrics] are not supported.
- https://opentelemetry.io/docs/reference/specification/metrics/data-model/#exponentialhistogram[Exponential Histograms] are not yet supported.
- The `sum`, `count`, `min` and `max` within the OpenTelemetry histogram data are discarded.
- The default histogram bucket boundaries are different from the OpenTelemetry default. They provide better resolution. They can be configured with the <<custom-metrics-histogram-boundaries>> configuration option.
- Metrics label names are dedotted (`s/\./_/g`) in APM server to avoid possible mapping collisions in Elasticsearch.
- The default Aggregation Temporality used differs from the OpenTelemetry default -- preferring
*delta*-temporality (nicer for visualizing in Kibana) to cumulative-temporality.

Metrics support requires an APM server >=7.11 -- for earlier APM server
versions, metrics with label names including `.`, `*`, or `"` will get dropped.

// XXX Temporality link to spec when merged: https://github.com/elastic/apm/pull/742/files#diff-a04e98daf311e4b4d6a186717a32577382b938c32ebcfc3a73f3b322e584532eR16


[float]
[[otel-baggage]]
===== Baggage
https://open-telemetry.github.io/opentelemetry-js-api/classes/propagationapi.html[Propagating baggage]
within or outside the process is not supported. Baggage items are silently
dropped.
[[otel-caveats-logs]]
===== Logs

The OpenTelemetry Logs API is currently not support -- only the Tracing and
Metrics APIs.
Loading