-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] Fix Baggage context leak issue #5208
[api] Fix Baggage context leak issue #5208
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5208 +/- ##
==========================================
- Coverage 83.38% 83.09% -0.29%
==========================================
Files 297 271 -26
Lines 12531 11956 -575
==========================================
- Hits 10449 9935 -514
+ Misses 2082 2021 -61
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to understand if there is a valid use case for TryGetBaggage
, and when should the user choose GetBaggage
vs. TryGetBaggage
.
Assert.Equal(4, Baggage.Current.Count); | ||
Assert.Equal(2, Baggage.Current.Count); | ||
Assert.Equal("value1", Baggage.GetBaggage("key1")); | ||
Assert.Equal("value2", Baggage.GetBaggage("key2")); | ||
Assert.Equal("value3", Baggage.GetBaggage("key3")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To some, this could seem like a very nuanced and perhaps unintuitive behavioral difference that stems from a call to an async method versus a synchronous method call. For documentation purposes, I suggest adding a synchronous test that looks almost identical just to further highlight the nuanced behavior. Something like:
[Fact]
public void NonAsyncLocalTests()
{
Baggage.SetBaggage("key1", "value1");
InnerTask();
Baggage.SetBaggage("key4", "value4");
Assert.Equal(4, Baggage.Current.Count);
Assert.Equal("value1", Baggage.GetBaggage("key1"));
Assert.Equal("value2", Baggage.GetBaggage("key2"));
Assert.Equal("value3", Baggage.GetBaggage("key3"));
Assert.Equal("value4", Baggage.GetBaggage("key4"));
static void InnerTask()
{
Baggage.SetBaggage("key2", "value2");
Baggage.SetBaggage("key3", "value3");
Assert.Equal(3, Baggage.Current.Count);
Assert.Equal("value1", Baggage.GetBaggage("key1"));
Assert.Equal("value2", Baggage.GetBaggage("key2"));
Assert.Equal("value3", Baggage.GetBaggage("key3"));
}
}
That said, though... while I understand the nuanced behavior, how would we want to explain this difference to end users? Can you add some additional description of the why in the PR description? I don't think as it stands it will be super clear to folks how this is more conformant with the specificaton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing my speculation on what I think (most?) users would intuitively expect in few different scenarios. Admittedly these things can be very subjective, however, this is based on my own experience supporting product features similar to the baggage api.
public void BaggageBehaviorWithSyncCall()
{
Baggage.SetBaggage("key1", "value1");
InnerTask();
Assert.Equal(2, Baggage.Current.Count);
Assert.Equal("value1", Baggage.GetBaggage("key1"));
Assert.Equal("value2", Baggage.GetBaggage("key2"));
static void InnerTask()
{
Baggage.SetBaggage("key2", "value2");
}
}
public async Task BaggageBehaviorWithAsyncCall()
{
Baggage.SetBaggage("key1", "value1");
await InnerTask();
Assert.Equal(2, Baggage.Current.Count);
Assert.Equal("value1", Baggage.GetBaggage("key1"));
Assert.Equal("value2", Baggage.GetBaggage("key2"));
static async Task InnerTask()
{
Baggage.SetBaggage("key2", "value2");
await Task.Yield();
}
}
public async Task BaggageBehaviorWithTaskRun()
{
Baggage.SetBaggage("key1", "value1");
Task.Run(InnerTask).GetAwaiter().GetResult();
Assert.Equal(1, Baggage.Current.Count);
Assert.Equal("value1", Baggage.GetBaggage("key1"));
static void InnerTask()
{
Baggage.SetBaggage("key2", "value2");
}
}
I think my Task.Run
scenario represents what folks are after in #3257 and I agree that it is the ideal behavior in this specific scenario most of the time - i.e., a parallel operation should ideally have it's own context. I'd go as far to say that this expectation is sometimes not limited to just baggage, but also to trace context, etc.
When it comes to async/await
calls, we're often not talking about a parallel operation in the CPU-bound kind of sense, but rather an I/O-bound operation that is logically considered part of the same unit of work as the caller. In this way, I find that people expect this scenario to behave in the same way as though they had made a synchronous call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to
async/await
calls, we're often not talking about a parallel operation in the CPU-bound kind of sense, but rather an I/O-bound operation that is logically considered part of the same unit of work as the caller. In this way, I find that people expect this scenario to behave in the same way as though they had made a synchronous call.
This is also subjective and murky - for example, an async/await call to an authentication library which interacts with backend services. Here is my suggestion when we encounter API design challenges that are subjective:
- Follow the specification, if the spec does not make sense, propose a fix to the spec (which is good forcing function to make people think deeper).
- Prioritize fail-fast. In this case, if the async/await call set the Baggage inside a Task and the developers were expecting the Baggage update from the caller's view, they will immediately notice things didn't work then figure out why. On the other side, context/baggage leakage is very hard to troubleshoot and would normally be discovered at much later stage of the development (and is normally a privacy/security incident).
- Provide explicit way for people to achieve special scenarios, e.g. here are some potential solutions Baggage.Current gets shared/overwritten when running multiple parallel tasks? #3257 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the specification, if the spec does not make sense, propose a fix to the spec (which is good forcing function to make people think deeper).
Fully agree with all of these principles, but I think this is the most pertinent one at the moment. I am not yet convinced that this PR brings us in closer alignment with the spec, but I do agree that it may be because the spec needs to be fixed if folks think this PR is a more correct behavior. As it stands, I do not find the new behavior proposed in this PR is any less murky or subjective.
Related, there exists critique that OpenTelemetry .NET does not implement the Context spec at all. See: open-telemetry/oteps#232 (comment).
I think we should hold on this PR and do some deeper thinking.
Provide explicit way for people to achieve special scenarios, e.g. here are some potential solutions Baggage.Current gets shared/overwritten when running multiple parallel tasks? #3257 (comment)
I support @CodeBlanch's workaround here #3257 (comment)
there is a workaround you can use right now: Call RuntimeContext.SetValue("otel.baggage", null)
This is effectively how Activity.Current
is designed. If I spin up parallel work and wish to force a new trace I can set Activity.Current
to null at the start of my new thread. To me, it makes more sense to try and keep the behavior of Baggage.Current
aligned with the behavior of Activity.Current
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be more creative. Here is an idea: https://github.com/CodeBlanch/opentelemetry-dotnet/compare/api-baggage-context...CodeBlanch:opentelemetry-dotnet:api-baggage-context-flow?expand=1
What I'm doing there is add an enum so users can explicitly control the behavior.
public async Task InnerTask()
{
// Explicitly fork baggage for downstream call
Baggage.SetBaggage("key2", "value2", flowBehavior: BaggageFlowBehavior.Fork);
await CallSomething();
}
public async Task InnerTask()
{
// Modify baggage as it works today. This will flow back
Baggage.SetBaggage("key2", "value2", flowBehavior: BaggageFlowBehavior.ModifyCurrent);
await CallSomething();
}
public async Task InnerTask()
{
// Whatever the default behavior is. Right now it is Fork
Baggage.SetBaggage("key2", "value2");
await CallSomething();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another creative solution: https://github.com/CodeBlanch/opentelemetry-dotnet/compare/api-baggage-context...CodeBlanch:opentelemetry-dotnet:api-baggage-context-mutate?expand=1
This is sort of an attempt to do what Reiley proposed ("Get a handle of the current baggage and pass the handle to other methods"):
Baggage.SetBaggage("appInstanceId", "abcd");
await ReadTenantIdFromDatabase();
// Baggage.Current here will contain "appInstanceId" & "tenantId"
async Task ReadTenantIdFromDatabase()
{
// Simulate async call to DB
await Task.Yield();
Baggage.MutateCurrentBaggage(b => b.SetBaggage("tenantId", "1234"));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before moving forward with this PR, I think we need to have more foundational discussions with regards to how OpenTelemetry .NET aligns (or does not align) with the Context specification - baggage only being one part of the broader requirements.
@alanwest @CodeBlanch @reyang Not sure, according to my understanding, PropagationContext is the implementation of Context in Spec. |
💯 I did not make this clear in my initial input on this PR, but I fully agree that mistakes have been made in the design of OpenTelemetry's baggage API. The idea that the baggage API has semantics similar to
This is the part that bugs me as well. This PR seeks to address this, but I think it still falls short. For example, I'm uneasy with the fact that the following code behaves differently whether void Main() {
MakeHttpCall();
// Baggage.Current contains key/value at this point in the execution.
// However, if the method were `async Task MakeHttpCall()` it will not contain key/value.
}
void MakeHttpCall() {
Baggage.SetBaggage("key", "value");
} Most importantly, I'm uneasy with the fact that, even with this PR, Baggage still has mutable properties.
void Main() {
using (var scope1 = ContextBuilder.NewContext().WithBaggage("key1", "key1").StartScope()) {
// Baggage.Current contains key1/value1
using (var scope2 = ContextBuilder.NewContext().WithBaggage({"key2", "value2"}).StartScope()) {;
// Baggage.Current only contains key2/value2 here.
// If you also wanted key1/value1 in baggage here, then it would need to be explicit.
// Something like:
// var scope2 = ContextBuilder.FromContext(scope1).WithBaggage({"key2", "value2"})
// The context of scope2 would be derived from scope1 and key2/value2 would be additive
}
// scope2 has been disposed and scope1 has been restored, and
// Baggage.Current contains key1/value1
}
} In this new design, the baggage API would still have a I have had some more time to consider this PR, and while I do not think it addresses our spec compliance problems, I believe that it does improve on the behaviors of our non-compliant baggage API, and I am comfortable considering the current behavior buggy. That said, it is a breaking change and I suspect the new behavior will be undesirable to a different set of our end users. In order to truly fix our spec compliance issues in our existing APIs, I think would require us to make source-breaking changes, and I think we should avoid this. For this reason, I will approve this PR, but I think we should consider how we can introduce new, actual spec-compliant baggage APIs in a future PR. The old APIs could remain, but would ideally be removed in a future major version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on my comments here #5208 (comment).
I merged this because we have enough approvals but I'm not sure anyone's thinking is totally correct 😄 I'll provide my thoughts just for fun.
public readonly struct Context
{
public SpanContext SpanContext { get; }
public Baggage Baggage { get; }
}
public readonly struct SpanContext {}
public readonly struct Baggage {} If two threads have the same context instance, they shouldn't impact each other: // thread 1
var newContext = someContext.SetBaggage("b2, "v2");
// Will never see b3
// thread 2
var newContext = someContext.SetBaggage("b3, "v3");
// Will never see b2 We don't have the Now we have this thing... public readonly struct Baggage
{
public static Baggage Current { get; set; }
}
This is where it gets less clear for me. The spec seems to bless that languages MAY provide some kind of global ambient thing if the language has support for it. But it doesn't prescribe exactly how it should work.
The question for me is what happens when that "Attach" (setter) is called. Today the ambient context is like a scoped service. The spec says it should...
What is "execution unit" meaning here? Not sure the spec defines that clearly. Scope of the request could be the execution unit. In .NET we have PS: If we say in .NET our "execution unit" is going to be |
This reverts commit 90a9c6c.
Adding some end user context as mentioned in the SIG. It is my understanding that after this PR was merged the following unit test which used to pass will now fail. public async Task BaggageBehaviorWithAsyncCall()
{
Baggage.SetBaggage("key1", "value1");
await InnerTask();
Assert.Equal(2, Baggage.Current.Count); // will instead be 1
Assert.Equal("value1", Baggage.GetBaggage("key1"));
Assert.Equal("value2", Baggage.GetBaggage("key2")); // will not exist
static async Task InnerTask()
{
Baggage.SetBaggage("key2", "value2");
await Task.Yield();
}
} This is going to cause a lot of issues for my organization. At my organization it is very common to execute a similar workflow as above where a value is retrieved in a nested async method (say from the database) and then that value is immediately set on the baggage. Requiring changes in method signatures to pass the information up the call stack will be quite an undertaking. If this was the behavior from the beginning I could understand, but changing this now seems pretty severe to me. |
Fixes #3257
Changes
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial changes