[api] Fix Baggage context leak issue #5208

CodeBlanch · 2024-01-10T17:28:14Z

Fixes #3257

Changes

Adds nullable annotations for the Baggage API.
Removes the wrapper object to prevent context leaks.

Merge requirement checklist

CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
Unit tests added/updated
Appropriate CHANGELOG.md files updated for non-trivial changes

codecov · 2024-01-10T17:33:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6250307) 83.38% compared to head (6a4870f) 83.09%.
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5208      +/-   ##
==========================================
- Coverage   83.38%   83.09%   -0.29%     
==========================================
  Files         297      271      -26     
  Lines       12531    11956     -575     
==========================================
- Hits        10449     9935     -514     
+ Misses       2082     2021      -61

Flag	Coverage Δ
unittests	`?`
unittests-Solution-Experimental	`83.08% <100.00%> (?)`
unittests-Solution-Stable	`83.03% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/OpenTelemetry.Api/Baggage.cs	`100.00% <100.00%> (ø)`

... and 32 files with indirect coverage changes

src/OpenTelemetry.Api/CHANGELOG.md

reyang

Need to understand if there is a valid use case for TryGetBaggage, and when should the user choose GetBaggage vs. TryGetBaggage.

src/OpenTelemetry.Api/CHANGELOG.md

alanwest · 2024-01-10T23:26:51Z

test/OpenTelemetry.Api.Tests/BaggageTests.cs

-        Assert.Equal(4, Baggage.Current.Count);
+        Assert.Equal(2, Baggage.Current.Count);
        Assert.Equal("value1", Baggage.GetBaggage("key1"));
-        Assert.Equal("value2", Baggage.GetBaggage("key2"));
-        Assert.Equal("value3", Baggage.GetBaggage("key3"));


To some, this could seem like a very nuanced and perhaps unintuitive behavioral difference that stems from a call to an async method versus a synchronous method call. For documentation purposes, I suggest adding a synchronous test that looks almost identical just to further highlight the nuanced behavior. Something like:

[Fact] public void NonAsyncLocalTests() { Baggage.SetBaggage("key1", "value1"); InnerTask(); Baggage.SetBaggage("key4", "value4"); Assert.Equal(4, Baggage.Current.Count); Assert.Equal("value1", Baggage.GetBaggage("key1")); Assert.Equal("value2", Baggage.GetBaggage("key2")); Assert.Equal("value3", Baggage.GetBaggage("key3")); Assert.Equal("value4", Baggage.GetBaggage("key4")); static void InnerTask() { Baggage.SetBaggage("key2", "value2"); Baggage.SetBaggage("key3", "value3"); Assert.Equal(3, Baggage.Current.Count); Assert.Equal("value1", Baggage.GetBaggage("key1")); Assert.Equal("value2", Baggage.GetBaggage("key2")); Assert.Equal("value3", Baggage.GetBaggage("key3")); } }

That said, though... while I understand the nuanced behavior, how would we want to explain this difference to end users? Can you add some additional description of the why in the PR description? I don't think as it stands it will be super clear to folks how this is more conformant with the specificaton.

Sharing my speculation on what I think (most?) users would intuitively expect in few different scenarios. Admittedly these things can be very subjective, however, this is based on my own experience supporting product features similar to the baggage api.

public void BaggageBehaviorWithSyncCall() { Baggage.SetBaggage("key1", "value1"); InnerTask(); Assert.Equal(2, Baggage.Current.Count); Assert.Equal("value1", Baggage.GetBaggage("key1")); Assert.Equal("value2", Baggage.GetBaggage("key2")); static void InnerTask() { Baggage.SetBaggage("key2", "value2"); } } public async Task BaggageBehaviorWithAsyncCall() { Baggage.SetBaggage("key1", "value1"); await InnerTask(); Assert.Equal(2, Baggage.Current.Count); Assert.Equal("value1", Baggage.GetBaggage("key1")); Assert.Equal("value2", Baggage.GetBaggage("key2")); static async Task InnerTask() { Baggage.SetBaggage("key2", "value2"); await Task.Yield(); } } public async Task BaggageBehaviorWithTaskRun() { Baggage.SetBaggage("key1", "value1"); Task.Run(InnerTask).GetAwaiter().GetResult(); Assert.Equal(1, Baggage.Current.Count); Assert.Equal("value1", Baggage.GetBaggage("key1")); static void InnerTask() { Baggage.SetBaggage("key2", "value2"); } }

I think my Task.Run scenario represents what folks are after in #3257 and I agree that it is the ideal behavior in this specific scenario most of the time - i.e., a parallel operation should ideally have it's own context. I'd go as far to say that this expectation is sometimes not limited to just baggage, but also to trace context, etc.

When it comes to async/await calls, we're often not talking about a parallel operation in the CPU-bound kind of sense, but rather an I/O-bound operation that is logically considered part of the same unit of work as the caller. In this way, I find that people expect this scenario to behave in the same way as though they had made a synchronous call.

When it comes to async/await calls, we're often not talking about a parallel operation in the CPU-bound kind of sense, but rather an I/O-bound operation that is logically considered part of the same unit of work as the caller. In this way, I find that people expect this scenario to behave in the same way as though they had made a synchronous call.

This is also subjective and murky - for example, an async/await call to an authentication library which interacts with backend services. Here is my suggestion when we encounter API design challenges that are subjective:

Follow the specification, if the spec does not make sense, propose a fix to the spec (which is good forcing function to make people think deeper).

Prioritize fail-fast. In this case, if the async/await call set the Baggage inside a Task and the developers were expecting the Baggage update from the caller's view, they will immediately notice things didn't work then figure out why. On the other side, context/baggage leakage is very hard to troubleshoot and would normally be discovered at much later stage of the development (and is normally a privacy/security incident).

Provide explicit way for people to achieve special scenarios, e.g. here are some potential solutions Baggage.Current gets shared/overwritten when running multiple parallel tasks? #3257 (comment)

Follow the specification, if the spec does not make sense, propose a fix to the spec (which is good forcing function to make people think deeper).

Fully agree with all of these principles, but I think this is the most pertinent one at the moment. I am not yet convinced that this PR brings us in closer alignment with the spec, but I do agree that it may be because the spec needs to be fixed if folks think this PR is a more correct behavior. As it stands, I do not find the new behavior proposed in this PR is any less murky or subjective.

Related, there exists critique that OpenTelemetry .NET does not implement the Context spec at all. See: open-telemetry/oteps#232 (comment).

I think we should hold on this PR and do some deeper thinking.

Provide explicit way for people to achieve special scenarios, e.g. here are some potential solutions Baggage.Current gets shared/overwritten when running multiple parallel tasks? #3257 (comment)

I support @CodeBlanch's workaround here #3257 (comment)

there is a workaround you can use right now: Call RuntimeContext.SetValue("otel.baggage", null)

This is effectively how Activity.Current is designed. If I spin up parallel work and wish to force a new trace I can set Activity.Current to null at the start of my new thread. To me, it makes more sense to try and keep the behavior of Baggage.Current aligned with the behavior of Activity.Current.

We could be more creative. Here is an idea: https://github.com/CodeBlanch/opentelemetry-dotnet/compare/api-baggage-context...CodeBlanch:opentelemetry-dotnet:api-baggage-context-flow?expand=1

What I'm doing there is add an enum so users can explicitly control the behavior.

public async Task InnerTask() { // Explicitly fork baggage for downstream call Baggage.SetBaggage("key2", "value2", flowBehavior: BaggageFlowBehavior.Fork); await CallSomething(); } public async Task InnerTask() { // Modify baggage as it works today. This will flow back Baggage.SetBaggage("key2", "value2", flowBehavior: BaggageFlowBehavior.ModifyCurrent); await CallSomething(); } public async Task InnerTask() { // Whatever the default behavior is. Right now it is Fork Baggage.SetBaggage("key2", "value2"); await CallSomething(); }

Here is another creative solution: https://github.com/CodeBlanch/opentelemetry-dotnet/compare/api-baggage-context...CodeBlanch:opentelemetry-dotnet:api-baggage-context-mutate?expand=1

This is sort of an attempt to do what Reiley proposed ("Get a handle of the current baggage and pass the handle to other methods"):

Baggage.SetBaggage("appInstanceId", "abcd"); await ReadTenantIdFromDatabase(); // Baggage.Current here will contain "appInstanceId" & "tenantId" async Task ReadTenantIdFromDatabase() { // Simulate async call to DB await Task.Yield(); Baggage.MutateCurrentBaggage(b => b.SetBaggage("tenantId", "1234")); }

alanwest

Before moving forward with this PR, I think we need to have more foundational discussions with regards to how OpenTelemetry .NET aligns (or does not align) with the Context specification - baggage only being one part of the broader requirements.

#5208 (comment)

liutiexing · 2024-01-15T13:17:34Z

@alanwest @CodeBlanch @reyang
Async calls are so common in Azure services. We can't call RuntimeContext.SetValue everywhere when we use Baggage APIs. These workarounds will be everywhere and destroy the maintainability.
We need an Immutable container (container changes inside an async Task are invisible in parent Task).
Spec said The Baggage container MUST be immutable, so that the containing Context also remains immutable.
I think current implementation of Baggage is partially mutable.
Baggage.Current is based on AsyncLocal. AsyncLocal is so fundamental to the CLR and so broadly used in various libraries that it is de facto the .NET standard for how to propagate ambient context. If we are to have different mechanisms and semantics for capturing and propagating ambient context, the APIs should be different enough to prevent any confusion. The OTel Baggage uses AsyncLocal, so it inherits some of the behavior but then defeats the rest sneakily through that "holder" hack.
I found OpenTelemetry C++ just provides Baggage, but no Baggage.Current. I think this is because C++ has no standard de facto parallel mechanism. If we have a dispute about thread/parallel behavior, mabe we can provide only Baggage without Baggage.Current (split Baggage and thread/parallel behavior), let the users implements their own Baggage.Current.

Not sure, according to my understanding, PropagationContext is the implementation of Context in Spec.

alanwest · 2024-01-15T19:46:23Z

Spec said The Baggage container MUST be immutable, so that the containing Context also remains immutable.

💯 I did not make this clear in my initial input on this PR, but I fully agree that mistakes have been made in the design of OpenTelemetry's baggage API. The idea that the baggage API has semantics similar to HttpContext.Current is a faulty premise and is definitely a violation of the specification.

I think current implementation of Baggage is partially mutable.

This is the part that bugs me as well. This PR seeks to address this, but I think it still falls short. For example, I'm uneasy with the fact that the following code behaves differently whether MakeHttpCall is an async method or not.

void Main() {
  MakeHttpCall();
  // Baggage.Current contains key/value at this point in the execution.
  // However, if the method were `async Task MakeHttpCall()` it will not contain key/value.
}

void MakeHttpCall() {
  Baggage.SetBaggage("key", "value");
}

Most importantly, I'm uneasy with the fact that, even with this PR, Baggage still has mutable properties.

Baggage.Current is based on AsyncLocal. AsyncLocal is so fundamental to the CLR and so broadly used in various libraries that it is de facto the .NET standard for how to propagate ambient context.

Baggage.Current and our underlying use of AsyncLocal definitely play a role, but I do not think they are the root of our design problems. I think the design of OpenTelemetry .NET's API is flawed. This is a rough idea, but I think the following is a better API design and would achieve spec compliance:

void Main() {
  using (var scope1 = ContextBuilder.NewContext().WithBaggage("key1", "key1").StartScope()) {
    // Baggage.Current contains key1/value1
    using (var scope2 = ContextBuilder.NewContext().WithBaggage({"key2", "value2"}).StartScope()) {;
      // Baggage.Current only contains key2/value2 here.
      
      // If you also wanted key1/value1 in baggage here, then it would need to be explicit.
      // Something like:
      // var scope2 = ContextBuilder.FromContext(scope1).WithBaggage({"key2", "value2"})
      // The context of scope2 would be derived from scope1 and key2/value2 would be additive
    }
    
    // scope2 has been disposed and scope1 has been restored, and
    // Baggage.Current contains key1/value1
  }  
}

In this new design, the baggage API would still have a Baggage.Current and it would still use AsyncLocal to propagate the scope across thread boundaries, but the important part is that Baggage would not be mutable under this design.

I have had some more time to consider this PR, and while I do not think it addresses our spec compliance problems, I believe that it does improve on the behaviors of our non-compliant baggage API, and I am comfortable considering the current behavior buggy. That said, it is a breaking change and I suspect the new behavior will be undesirable to a different set of our end users.

In order to truly fix our spec compliance issues in our existing APIs, I think would require us to make source-breaking changes, and I think we should avoid this.

For this reason, I will approve this PR, but I think we should consider how we can introduce new, actual spec-compliant baggage APIs in a future PR. The old APIs could remain, but would ideally be removed in a future major version.

alanwest

Approving based on my comments here #5208 (comment).

CodeBlanch · 2024-01-16T19:00:04Z

I merged this because we have enough approvals but I'm not sure anyone's thinking is totally correct 😄 I'll provide my thoughts just for fun.

The Baggage container MUST be immutable, so that the containing Context also remains immutable.

Context in this context (🤣) is describing a structure IMO.

public readonly struct Context
{
    public SpanContext SpanContext { get; }
    public Baggage Baggage { get; }
}

public readonly struct SpanContext {}
public readonly struct Baggage {}

If two threads have the same context instance, they shouldn't impact each other:

// thread 1
var newContext = someContext.SetBaggage("b2, "v2");
// Will never see b3

// thread 2
var newContext = someContext.SetBaggage("b3, "v3");
// Will never see b2

We don't have the Context structure. We have public readonly struct ActivityContext and public readonly struct Baggage. But both DO have the immutable behaviors so the above will hold true. I don't think the Baggage API is fundamentally flawed in that regard.

Now we have this thing...

public readonly struct Baggage
{
    public static Baggage Current { get; set; }
}

Baggage.Current is an ambient way to access the current "Context" \ Baggage.

This is where it gets less clear for me. The spec seems to bless that languages MAY provide some kind of global ambient thing if the language has support for it. But it doesn't prescribe exactly how it should work.

Baggage.Current.getter is like the spec Get current Context. No issues here I don't think.

Baggage.Current.setter performs some of what "Attach" and "Detach" are supposed to do (by the spec). If I could do it over again, we probably wouldn't have that setter. We could have public static IDisposable Attach(Baggage baggage) instead (which would be much closer to the spec).

The question for me is what happens when that "Attach" (setter) is called. Today the ambient context is like a scoped service. The spec says it should...

Associates a Context with the caller's current execution unit.

What is "execution unit" meaning here? Not sure the spec defines that clearly. Scope of the request could be the execution unit. In .NET we have AsyncLocal and ExecutionContext, that could also be the "execution unit" and that's what this PR does. Since Activity.Current also behaves this way, probably good to be consistent.

PS: If we say in .NET our "execution unit" is going to be ExecutionContext then I don't know if we need the global "attach" and "detach" operations because those APIs exist on ExecutionContext already ("capture" and "restore") 🤷

This reverts commit 90a9c6c.

danelson · 2024-01-17T18:12:38Z

Adding some end user context as mentioned in the SIG. It is my understanding that after this PR was merged the following unit test which used to pass will now fail.

public async Task BaggageBehaviorWithAsyncCall()
{
    Baggage.SetBaggage("key1", "value1");
    await InnerTask();
    Assert.Equal(2, Baggage.Current.Count); // will instead be 1
    Assert.Equal("value1", Baggage.GetBaggage("key1"));
    Assert.Equal("value2", Baggage.GetBaggage("key2")); // will not exist

    static async Task InnerTask()
    {
        Baggage.SetBaggage("key2", "value2");
        await Task.Yield();
    }
}

This is going to cause a lot of issues for my organization. At my organization it is very common to execute a similar workflow as above where a value is retrieved in a nested async method (say from the database) and then that value is immediately set on the baggage. Requiring changes in method signatures to pass the information up the call stack will be quite an undertaking. If this was the behavior from the beginning I could understand, but changing this now seems pretty severe to me.

Remove the wrapper object to prevent Baggage context leaks.

5101dd3

CodeBlanch added the pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package label Jan 10, 2024

CodeBlanch requested a review from a team January 10, 2024 17:28

CHANGELOG patch.

f0cf502

reyang reviewed Jan 10, 2024

View reviewed changes

src/OpenTelemetry.Api/CHANGELOG.md Outdated Show resolved Hide resolved

reyang requested changes Jan 10, 2024

View reviewed changes

CodeBlanch added 3 commits January 10, 2024 14:07

Remove TryGetBaggage.

1607820

CHANGELOG update.

6cd16a0

Warning cleanup.

8b1bc93

reyang approved these changes Jan 10, 2024

View reviewed changes

reyang reviewed Jan 10, 2024

View reviewed changes

src/OpenTelemetry.Api/CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG updates.

570385f

alanwest reviewed Jan 10, 2024

View reviewed changes

Code review.

5ebf697

alanwest requested changes Jan 11, 2024

View reviewed changes

Test tweak.

386204b

CodeBlanch marked this pull request as draft January 12, 2024 21:26

CodeBlanch mentioned this pull request Jan 12, 2024

[api] Add Baggage.Attach API #5218

Closed

4 tasks

alanwest approved these changes Jan 15, 2024

View reviewed changes

Merge branch 'main' into api-baggage-context

6a4870f

CodeBlanch marked this pull request as ready for review January 16, 2024 17:46

CodeBlanch merged commit 90a9c6c into open-telemetry:main Jan 16, 2024
37 checks passed

CodeBlanch deleted the api-baggage-context branch January 16, 2024 18:23

CodeBlanch added a commit to CodeBlanch/opentelemetry-dotnet that referenced this pull request Jan 17, 2024

Revert "[api] Fix Baggage context leak issue (open-telemetry#5208)"

bdaa443

This reverts commit 90a9c6c.

CodeBlanch mentioned this pull request Jan 17, 2024

[revert] [api] Fix Baggage context leak issue #5227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api] Fix Baggage context leak issue #5208

[api] Fix Baggage context leak issue #5208

CodeBlanch commented Jan 10, 2024 •

edited

Loading

codecov bot commented Jan 10, 2024 •

edited

Loading

reyang left a comment

alanwest Jan 10, 2024 •

edited

Loading

alanwest Jan 11, 2024 •

edited

Loading

reyang Jan 11, 2024 •

edited

Loading

alanwest Jan 11, 2024

CodeBlanch Jan 11, 2024

CodeBlanch Jan 11, 2024

alanwest left a comment

liutiexing commented Jan 15, 2024 •

edited

Loading

alanwest commented Jan 15, 2024 •

edited

Loading

alanwest left a comment

CodeBlanch commented Jan 16, 2024

danelson commented Jan 17, 2024

[api] Fix Baggage context leak issue #5208

[api] Fix Baggage context leak issue #5208

Conversation

CodeBlanch commented Jan 10, 2024 • edited Loading

Changes

Merge requirement checklist

codecov bot commented Jan 10, 2024 • edited Loading

Codecov Report

reyang left a comment

Choose a reason for hiding this comment

alanwest Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

alanwest Jan 11, 2024 • edited Loading

Choose a reason for hiding this comment

reyang Jan 11, 2024 • edited Loading

Choose a reason for hiding this comment

alanwest Jan 11, 2024

Choose a reason for hiding this comment

CodeBlanch Jan 11, 2024

Choose a reason for hiding this comment

CodeBlanch Jan 11, 2024

Choose a reason for hiding this comment

alanwest left a comment

Choose a reason for hiding this comment

liutiexing commented Jan 15, 2024 • edited Loading

alanwest commented Jan 15, 2024 • edited Loading

alanwest left a comment

Choose a reason for hiding this comment

CodeBlanch commented Jan 16, 2024

danelson commented Jan 17, 2024

CodeBlanch commented Jan 10, 2024 •

edited

Loading

codecov bot commented Jan 10, 2024 •

edited

Loading

alanwest Jan 10, 2024 •

edited

Loading

alanwest Jan 11, 2024 •

edited

Loading

reyang Jan 11, 2024 •

edited

Loading

liutiexing commented Jan 15, 2024 •

edited

Loading

alanwest commented Jan 15, 2024 •

edited

Loading