-
Notifications
You must be signed in to change notification settings - Fork 14
Official means to put the full trace context back #74
Comments
I don't understand what As for being able to be called before and after, isn't that the reason we use pdict for context? That way the library you are instrumenting doesn't have to be modified to be passed context, instead you start a span in the before call and end it in the after through the process dictionary. And even though the library is not directly passed a context if it calls out to another library that is also instrumented it will be able to create children. Your concern is simply that you don't trust the context to be intact when you get to the 'end' call? Like if the children don't get end'ed and you end up calling an end to one of them because they are the current span still? |
Oh, sorry, Yes, my concern is that if someone I call doesn't end their spans in exactly the right way, I can't end mine the right way unless I use -spec with_span(opentelemetry:tracer(), opentelemetry:span_name(),
ot_span:start_opts(), ot_tracer:traced_fun(T)) -> T.
with_span(Tracer, SpanName, Opts, Fun) ->
%% starting the span makes it active in the `tracer_ctx'
SpanCtx = start_inactive_span(Tracer, SpanName, Opts),
PreviousTracerCtx = current_ctx(Tracer),
set_span(Tracer, SpanCtx),
try
Fun(SpanCtx)
after
%% passing TracerCtx directly ensures that this `end_span' ends the span started
%% in this function and sets the active span to the previous active tracer ctx.
%% If spans in `Fun()' were started and not finished properly they will be
%% abandoned and it be up to the `ot_span_sweeper' to eventually remove them.
_ = end_span(Tracer, SpanCtx),
ot_ctx:set_value(?TRACER_KEY, ?TRACER_CTX, PreviousTracerCtx)
end. We're exporting If I use |
Specifically, I propose:
In this repo, that'd be something like this but without any syntax errors? diff --git src/ot_tracer.erl src/ot_tracer.erl
index eb80d7a..f2c3c83 100644
--- src/ot_tracer.erl
+++ src/ot_tracer.erl
@@ -24,2 +24,3 @@
with_span/4,
+ set_ctx/2,
current_ctx/1,
@@ -51,2 +52,3 @@
-callback current_ctx(opentelemetry:tracer()) -> tracer_ctx().
+-callback set_ctx(opentelemetry:tracer(), tracer_ctx()) -> ok.
-callback current_span_ctx(opentelemetry:tracer()) -> opentelemetry:span_ctx().
@@ -80,5 +82,5 @@ end_span(Tracer={Module, _}) ->
--spec end_span(opentelemetry:tracer(), ot_tracer:tracer_ctx()) -> boolean() | {error, term()}.
-end_span(Tracer={Module, _}, TracerCtx) ->
- Module:end_span(Tracer, TracerCtx).
+-spec end_span(opentelemetry:tracer(), ot_tracer:span_ctx()) -> boolean() | {error, term()}.
+end_span(Tracer={Module, _}, SpanCtx) ->
+ Module:end_span(Tracer, SpanCtx).
@@ -88,2 +90,6 @@ current_ctx(Tracer={Module, _}) ->
+-spec set_ctx(opentelemetry:tracer(), ot_tracer:tracer_ctx()) -> ok.
+set_ctx(Tracer={Module, _}, ) ->
+ Module:current_ctx(Tracer).
+
-spec current_span_ctx(opentelemetry:tracer()) -> opentelemetry:span_ctx(). In open-telemetry/opentelemetry-erlang, something like this? diff --git src/ot_tracer_default.erl src/ot_tracer_default.erl
index 6f05f77..2e5a1f2 100644
--- src/ot_tracer_default.erl
+++ src/ot_tracer_default.erl
@@ -29,2 +29,3 @@
current_ctx/1,
+ set_ctx/2,
current_span_ctx/1,
@@ -133,2 +134,7 @@ current_ctx(_Tracer) ->
+-spec set_ctx(opentelemetry:tracer(), ot_tracer:tracer_ctx()) -> ok.
+set_ctx(_Tracer, PreviousTracerCtx) ->
+ ot_ctx:set_value(?TRACER_KEY, ?TRACER_CTX, PreviousTracerCtx),
+ ok.
+
span_module({_, #tracer{span_module=SpanModule}}) -> End result: anyone who can't use %% Wrapping execution in `current_ctx`,`start_span` and `set_ctx`, `end_span' pairs
%% ensures we end the correct span and restore the correct tracer context.
%% If spans in between those pairs were started and not finished properly they will be
%% abandoned and it be up to the `ot_span_sweeper' to eventually remove them. |
Meanwhile OT-Python removed that API surface entirely open-telemetry/opentelemetry-python#420 citing the new Context spec. Note the spec's emphasis on the calls' availability to OT instrumentation library authors as well as the internal implementation:
… and their care around ensuring we can restore context even if people stuff up:
|
TL;DR of my take from the SIG meeting:
|
Note, I didn't mean a map of span contexts to trace contexts. That would be bad for garbage collection and memory copying. Instead we would have a process dict like:
Ending a span either takes no argument and uses So if your code created But there is a clear memory leak here where unended spans stick around forever, along with their list of ancestors (trace context). Maybe it is worth doing and simply documenting this potential leak while offering a Tough call... |
Yikes didn't see this a month ago; sorry. Staying zoomed out, strikes me this helps OT instrumentation library authors deal with tricky integration situations, which I keep forgetting was my original goal. Double-checking: call If you're not all 😬 I'm all 😀 as that author because the obvious method is correct. Relief! Zooming in a little: I'm comforted knowing the span count will tend to stay predictable unless someone repeatedly starts a span, crashes out of that call stack without ending it, rescues before the process dies, and lives to start another span again. That'll balloon with the current context storage, too. If a map of span contexts to trace contexts is bad for GC and memory copying, but using the process dictionary to map span contexts to trace contexts is OK, what data structure is the process dictionary using? Can we use it, too? Feels weird taking that much key-space in a shared data structure. How slow is too slow, re |
I think we should try my example of keying on the span ctx. I realized even the current solution is bad if the number of levels of spans is large because its all a single pdict value meaning lots of copying and GC. |
Or should context be something like:
And Ending a span sets the |
Instead of Ending a span by passing a To do what you need would instead be like: CtxToken = ot_ctx:current(),
%% if a current context exists this creates a copy of it so we still have the correlation_context
NewCtxToken = ot_ctx:new(),
SpanCtx1 = ot_tracer:start_span(span1),
ot_span:end_span(SpanCtx1),
ot_ctx:set_context(CtxToken). |
TL;DR:
ot_tracer:end_span
will cause problems for any of their callers usingot_tracer:end_span
as the stack unwinds, until we hit anot_tracer:with_span
which can put the full context backot_tracer:set_ctx/2
fromot_tracer.with_span/3
would let us keep our “help implement automatic scope switching and define higher level APIs by SDK components” fix while also covering “and OpenTelemetry instrumentation libraries” at the end of the sentence (both quotes from the context spec; emphasis mine)Split out from #26 while we're discussing renaming things and other API changes #73…
As someone writing OpenTelemetry integrations for packages I don't control, the current API gives me excellent support for when those packages provide a way to get between then and their back end eg. configurable behaviour.
ot_tracer:with_span/3
or/4
wrapped around the next call will restore 100% of the trace context.If the package instead gives me a chance to get called before and after a significant operation, I can't see a safe way to restore the trace context afterward.
ot_tracer:current_ctx/0
will give it to me beforehand, but there's no way to put it back without reverse-engineering the tracer and context manager and hoping nobody changes either.Elixir demo:
The text was updated successfully, but these errors were encountered: