Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request metrics to System.Net.Http #85447

Closed
wants to merge 4 commits into from

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented Apr 27, 2023

Addresses #84978. Work in progress.

Adds:

  • request-duration counter
  • current-requests counter
  • HttpMessageInvoker.Meter property
  • HttpRequestMessage.MetricsTags property
  • Client connection counters
  • Moves request stop event for counters to response body finished reading/request abort.

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned JamesNK Apr 27, 2023
@ghost
Copy link

ghost commented Apr 27, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Addresses #84978. Work in progress.

Adds:

  • request-duration counter
  • current-requests counter
  • HttpMessageInvoker.Meter property
  • HttpRequestMessage.MetricsTags property
  • Client connection counters
  • Moves request stop event for counters to response body finished reading/request abort.
Author: JamesNK
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@JamesNK JamesNK requested review from reyang and tarekgh April 27, 2023 07:36
Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if @MihaZupan has any thoughts?

Comment on lines 121 to +122
HttpTelemetry.Log.RequestStart(request);
metrics.RequestStart(request);
Copy link
Member

@antonfirsov antonfirsov Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, we will face all the overhead of both HttpTelemetry and Metrics if any of them is enabled.

I would do a second check for HttpTelemetry.Log.IsEnabled(), and fine-grained checks of <instrument>.Enabled within HttpMetrics methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure these checks exist in HttpTelemetry as well?
That is, change places like RequestStart/Stop in HttpTelemetry to

Interlocked.Increment(ref _startedRequests);

if (IsEnabled(...))
{
    RequestStart(
        request.RequestUri.Scheme,
        request.RequestUri.IdnHost...);
}

The counter fields should continue to be updated, but we don't have to read the Uri properties/serialize the event if the EventSource isn't enabled.


if (response is not null)
{
tags.Add("status-code", (int)response.StatusCode); // Boxing?
Copy link
Member

@antonfirsov antonfirsov Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TagList looks like a boxing factory. I wounder if this concern was considered during the design of #56936. If it's concerning enough we may consider caching the objects, but that would come with its' own overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is having a cache of boxed commonly used status codes.

if (response is not null)
{
tags.Add("status-code", (int)response.StatusCode); // Boxing?
tags.Add("protocol", $"HTTP/{response.Version}"); // Hacky
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacky because it relies on formatting implementation? Shall we do something like this to avoid the formatting allocation?

static string GetProtocolName6(Version httpVersion) => (httpVersion.Major, httpVersion.Minor) switch
{
    (1, 1) => "HTTP/1.1",
    (2, 0) => "HTTP/2.0",
    (3, 0) => "HTTP/3.0",
    _ => "unknown"
};

Copy link
Member Author

@JamesNK JamesNK Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacky because it allocates, some values aren't right*, and it's simple to improve such as your snippet.

*HTTP/2 would be better than HTTP/2.0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid allocation. We wouldn't want our services to pay the allocation cost for something which our library will anyways drop and not emit. Protocol is one of them

#pragma warning disable SA1129 // Do not use default value type constructor
var tags = new TagList();
#pragma warning restore SA1129 // Do not use default value type constructor
InitializeCommonTags(ref tags, request);
Copy link
Member

@antonfirsov antonfirsov Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of redirects you may log different tags here than at RequestStart, we need to snapshot the initial values.

Copy link
Member Author

@JamesNK JamesNK Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties on HttpRequestMessage are updated with a response redirect?

In that case, what request values should be used?

Options:

  1. Don't use the initial request values anywhere. That means we don't include tags on current-requests and have a dimensionless counter. The request-duration counter uses the HttpRequestMessage state based on when SendAsync returns.
  2. Use initial request values with current-requests and request-duration. Ignore request message state after SendAsync returns. Maybe have an extra counter to check if the URL changes between sending the request and receiving the response to record redirect counts.
  3. A mix. Initial values on current-requests and potentially updated values on request-duration.

No perfect choice. I like option 2, but curious about what client networking experts think and what current diagnostics does.

Copy link
Member

@antonfirsov antonfirsov May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties on HttpRequestMessage are updated with a response redirect?

Yes, the request is being reused to save allocations, but I think this is a fundamental problem, regardless of our implementation.

what current diagnostics does.

I don't think this concern exists in case of existing counters. The only ones where we have something similar to tags are the <protocol>-connections-current-total and the <protocol>-requests-queue-duration counters. In both cases the protocol version is clear at the moment we increment the counter.

Use initial request values with current-requests and request-duration. Ignore request message state after SendAsync returns.

Wouldn't this be unexpected to users? For example by using HttpVersionPolicy.RequestVersionOrLower/RequestVersionOrHigher and negotiating lower/higher versions, a user would not log the actual HTTP versions being used.

No perfect choice.

Isn't it also an option to record multiple requests on redirects? (Even though it means that the implementation would be bound to SocketsHttpHandler and other handlers would need to implement the metrics separately.)

I have opened open-telemetry/opentelemetry-specification#3462, hoping that OTel folks can help clarifying this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this concern exists in case of existing counters.

Yeah existing counters don't have request-specific info on them, but I was thinking about what is in logs and the activity.

Copy link
Member

@MihaZupan MihaZupan May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Anton that relying on the request message's version can be quite misleading.
YARP for example uses a default version of 2.0 with a RequestVersionOrLower policy.
If we only use the request version, the user would only see 2.0 counters even if they're actually only talking 1.1 to the server.

The unfortunate thing is that we'll only know what version is actually used when we try sending the request to the wire internally, or from a higher-level perspective, once you get back the response message object.

It may make sense to consider moving this logic further down into SocketsHttpHandler, especially if you will also want to track when the response content has been read completely.

Copy link
Member Author

@JamesNK JamesNK May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the version isn't known when a request starts, it's not included as a tag on current-requests. And request-duration uses HttpResponseMessage.Version for this information. This PR already has that behavior.

{
// TODO: Should the Meter and HttpMetrics be static and shared by default?
get => _meter ??= new Meter("System.Net.Http");
set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarekgh @JamesNK - I mentioned in an earlier email but it may have gotten missed, what do you guys think about consuming an IMeterFactory instead of a Meter? Then the Meter can be produced:

   _meter = _factory == null ? new Meter("System.Net.Http") : factory.CreateMeter("System.Net.Http");

This eliminates the need for the caller to understand what meter name is expected.

This implies that IMeterFactory gets defined in System.Diagnostics.DiagnosticSource.dll which I don't think would be an issue but I might be overlooking something.

Copy link
Member Author

@JamesNK JamesNK May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting IMeterFactory to be in Microsoft.Extensions.Metrics and so not available here. That's part of why I chose this approach.

The other reason for exposing the meter is it makes consuming the metrics for just one client quite simple:

var invoker = new HttpMessageInvoker();
var currentRequestsRecorder = new InstrumentRecorder<double>(invoker.Meter, "request-duration");

But that only works if each client/invoker has its own Meter and counters copy, which might not be desirable.

Copy link
Member

@noahfalk noahfalk May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting IMeterFactory to be in Microsoft.Extensions.Metrics and so not available here

Initially I imagined it there too. Do you feel like it is important for IMeterFactory to not be in System.Diagostics.Metrics?

The other reason for exposing the meter is it makes consuming the metrics for just one client quite simple:

If users were injecting a factory instead the pattern wouldn't be much different:

var IMeterFactory factory = ...
var invoker = new HttpMessageInvoker();
invoker.Factory = factory; // separately, maybe we should be using a constructor overload rather than a property setter?
var currentRequestsRecorder = new InstrumentRecorder<double>(factory, "System.Net.Http", "request-duration");

If the app developer wasn't injecting the factory and using the default Meter then I wouldn't expect them to write tests for that as it would effectively just be testing runtime behavior and not their own code.

Copy link
Member

@antonfirsov antonfirsov May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be a Func<Meter> on HttpMessageInvoker, so then we don't have to worry about the dependencies, and IMeterFactory (if an interface is needed at all) can live in Microsoft.Extensions.* where the DI-friendly things go usually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically IMeterFactory naturally belongs to the Microsoft.extensions more because we'll have another extension method to register this interface to the DI. But technically nothing can stop us having it in DianosticSource library. I am thinking in the future when exposing more stuff like MeterBuilder or anything else, at that time it will look strange having IMeterFactory in one place and other related interfaces in other place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be a Func<Meter> on HttpMessageInvoker

Func<string,Meter> is the signature we'd need for a similar effect as IMeterFactory, but thats no obstacle. My other motivation for IMeterFactory instead of something like Func is so that .NET types are following the same guidance we are planning to give to everyone else. Its not written up in our official docs yet, but the basic pattern is:

  1. Define a parameter on your constructor of type IMeterFactory
  2. Use the factory to create the Meter.

While other patterns are certainly possible (such as passing in a Meter or passing in a Func<string,Meter>), many of them aren't as easy to use with DI. I don't expect folks will be creating HttpMessageInvokers from a DI container so its fair to argue it doesn't matter if we pick a pattern that is DI-friendly. I think the counter-argument is that if the DI-friendly pattern becomes a convention then devs will anticipate that pattern regardless whether the type is created from a DI-container and APIs that don't follow the pattern feel unexpected and awkward. For example most people expect that if your type logs via ILogger then your constructor has an ILogger constructor parameter. Writing a class that has a method SetLogger(ILogger) or a constructor parameter Func<ILogger> functionally would work too, but it looks unusual because it doesn't follow the convention.

Logically IMeterFactory naturally belongs to the Microsoft.extensions

Tarek and I chatted earlier. I agree that it felt intuitive for me as well initially for IMeterFactory to be in the Microsoft.Extensions namespace, but I don't think its the only reasonable association. The general guidance with DI is that the types being created from the DI container should not have any tight coupling to the DI container (for example don't have your constructor use IServiceProvider as input argument). This allows using those types without DI because a caller could always manually instantiate the same arguments. This perspective suggests that any of the types that show up in the constructor parameters, such as IMeterFactory, should not be treated as tightly coupled to DI. If the type isn't coupled to DI the fact that it would land in Microsoft.Extensions when all the other Meter types are in System.Diagnostics didn't feel so intuitive any more.

Ultimately I assume its rare for .NET developers to manually create an HttpMessageInvoker so if folks think its not helpful to follow the IMeterFactory pattern on this one I'll be fine with it. If the pattern is also going to show up on HttpClient I might push a little harder due to more exposure, but still I expect the main usage for injecting a Meter will come from HttpClientFactory internally rather than .NET devs manually injecting one.

Copy link
Member

@antonfirsov antonfirsov May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I assume its rare for .NET developers to manually create an HttpMessageInvoker [...]. If the pattern is also going to show up on HttpClient [...]

HttpMessageInvoker is the base class for HttpClient so everything that goes into HttpMessageInvoker goes into HttpClient. I'm treating them as synonyms in this discussion.

Define a parameter on your constructor of type IMeterFactory

The pattern with HttpClient is to consume configuration/dependencies through properties; the only parameter being passed to the constructor is the HttpMessageHandler. It would look very weird if new ctr. overloads would appear taking an IMeterFactory, especially if most users don't need it. (In general, System.Net.Http wasn't designed to be DI-friendly, so making a occasional intrusive decisions following a different design approach would create too much disharmony IMO.)

I expect the main usage for injecting a Meter will come from HttpClientFactory internally rather than .NET devs manually injecting one

In that case, wouldn't it be straightforward to make HttpClientFactory's responsibility to wrap IMeterFactory into a Func<string, Meter> when initializing HttpClient?

If the type isn't coupled to DI the fact that it would land in Microsoft.Extensions when all the other Meter types are in System.Diagnostics didn't feel so intuitive any more.

Even if it's not tightly coupled to DI, the question is if there are any other reasons to make it land in System.Diagnostics other than making DI use-cases easier? Also: are there any other System.* libraries that may end up consuming IMeterFactory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I assume its rare for .NET developers to manually create an HttpMessageInvoker so if folks think its not helpful to follow the IMeterFactory pattern on this one I'll be fine with it. If the pattern is also going to show up on HttpClient I might push a little harder due to more exposure, but still I expect the main usage for injecting a Meter will come from HttpClientFactory internally rather than .NET devs manually injecting one.

As @antonfirsov mentioned, HttpMessageInvoker = HttpClient. The client inherits from the invoker. The design needs to work for both.

Manually creating HttpClient is very common. I expect customizing the meter on a client will be much rarer. The primary case for using it will be Microsoft.Extensions.Http.

The way I imagined this hanging together is Microsoft.Extensions.Http has a dependency on Microsoft.Extensions.Metrics. IMeterFactory is injected into HttpClientFactory and sets a meter on the created HttpClient from the factory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HttpMessageInvoker is the base class for HttpClient

Ah! I didn't catch that relationship. Thanks for the important info.

The pattern with HttpClient is to consume configuration/dependencies through properties

OK. If HttpClient already has that pattern then I guess we have clashing patterns, but it probably makes sense for HttpClient's pattern to take priority.

In that case, wouldn't it be straightforward to make HttpClientFactory's responsibility to wrap IMeterFactory into a Func<string, Meter> when initializing HttpClient

I agree it doesn't take a lot of code to do that wrapping, but assume it was already the plan that IMeterFactory was in System.Diagnostics.DiagnosticSource.dll - would there be any reason to wrap it in the first place?

are there any other System.* libraries that may end up consuming IMeterFactory?

I don't have any concrete plan to point to, but other forms of IO like files or pipes seems like potential candidates for metrics in the System.* namespaces. Its also possible we'd want System.Runtime metrics reported via a factory-produced Meter.

the question is if there are any other reasons to make it land in System.Diagnostics other than making DI use-cases easier

I don't think we need more reasons to meet the bar, but these are the pros/cons as I see it:

  • [Pro]: More libraries will have IMeterFactory accessible to them if they aren't able to take a dependency on Microsoft.Extensions. Even in libraries where there isn't a strict rule against such dependencies, library authors are still likely to prefer fewer dependencies.
  • [Pro]: Assuming we also have IMeterFactory in the System.Diagnostics.Metrics namespace then consumers only need one namespace in their code rather than two.
  • [Con?]: There is the idea that any type used to make the DI scenario easier (IMeterFactory) belongs in M.E.something.dll. However aside from the idea feeling natural, I haven't seen any concrete benefit to organizing it that way in this case.

When I look at the balance scales I see it as small upside and effectively no downside.

}
if (request.HasTags)
{
foreach (var customTag in request.MetricsTags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to HttpRequest properties not being stable, these tags might also not be stable across the duration of the request. Your EnrichmentHandler test is an example that deliberately changes them.

I think one scenario we need to sort out is the case where the caller to HttpClient is some 3rd party library that isn't specifying tags but the app developer does want to get all their traffic classified. Currently R9 is handling this case by having the app developer create a mapping from outbound URIs -> other metadata, then they register a handler in the HttpClientFactory to execute that mapping and generate the tagged telemetry. If R9 is going to switch over to use these built-in metrics rather than continuing to add their own then they will need some way to dynamically inject those tags. If the injection mechanism will continue to be a handler (similar to your EnrichmentHandler), then perhaps we need to move the metric collection point back to the DiagnosticsHandler where it is now so that other handlers have a chance to update those tags before they are captured. (I know I suggested doing it at the front earlier, I'm not sure my idea was a good one now that I consider this scenario)

@dpk83 - you probably want to take a look at this if you haven't already

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, will the same metrics support be added for net framework or is it limited to net core only?

Copy link
Member Author

@JamesNK JamesNK May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the concern here. R9 has a DelegatingHandler that is responsible for recording to the metric. The handler also executes enrichers to build up a collection of custom tags.

The difference in this prototype is recording to the metric and gathering custom tags happens in different places. The handler writes them onto the HttpRequestMessage, and they are read off the message later when needed.

My main concern with this approach is making sure there is the opportunity for an enrichment handler to add tags after SendAsync has returned. Right now the metrics are recorded in the HttpMessageInvoker. That means the code after SendAsync in each delegating handler runs before the counter is recorded. Metrics at this layer works, but it doesn't record separate HTTP requests if a Polly handler is added for retries. The solution is to move metrics to DiaganosticsHandler. That handler is always inside a retry handler, so each HTTP request is recorded, but we lose the ability to do enrichment after SendAsync returns. DiaganosticsHandler is added after any user-injected handlers (such as enrichment), so metrics are recorded before code after SendAsync runs.

A solution that supports both goals could be to wait to record the HTTP duration counter value when an HTTP request has completed both of these actions:

  1. The HttpRequestMessage exits the HTTP pipeline
  2. The HttpResponseMessage finishes reading the response

It would be slightly odd that DiaganosticsHandler "starts" metrics for an HTTP request (e.g. starts the request duration timer, increments current request count), and exiting HttpMessageInvoker "finishes" it (e.g. records request duration with enriched tags and decreases current request count).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion, upon re-reading the code I see that I misunderstood it earlier and my comment was based on that misunderstanding. I was incorrectly worried that custom tags would cause the current-requests increment operation to be recorded with different tags than the decrement which would unbalance it. Now I see that custom tags are only used for the duration measurement and my concern was unfounded.

While I do still care about the scenario I described above with 3rd party libraries, I think this design would be able to handle it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few points:

  • Quite often caller doesn't have access to the HttpRequestMessage, third party library is a very good example of this scenario. In complex organizations where there are hundreds and thousands of services a common libraries are used quite a lot and this option just doesn't work. We devised IDownstreamDependencyMetadata for that very purpose.

  • We are now supporting both DelegatingHandler based auto collection of metrics as well as DiagnosticsSource based approach because DelegatingHandler wasn't sufficient for a lot of cases.

  • In order to consume this metrics directly emitted from .NET and provide the transformation and the dependency metadata looks like we will still continue to use DelegatingHandler and DiagnosticsSource so I don't see how we can take consume and build onto the metrics this component is going to emit without incurring additional perf overhead.

  • Is this support is going to be ported back for .NET Framework?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is to move metrics to DiaganosticsHandler

Note that DiagnosticsHandler is now strongly coupled with distributed tracing, and disabling distributed tracing either by by setting setting DOTNET_SYSTEM_NET_HTTP_ENABLEACTIVITYPROPAGATION=0 or by setting socketsHttpHandler.ActivityHeadersPropagator = null would completely disable it.

Wouldn't it be better to implement metrics in a separate MetricsHandler that would live either before or after DiagnosticsHandler in the chain?

Copy link
Member Author

@JamesNK JamesNK May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about that switch. Good to know.

If a separate handler is better, then do it. The difference we're considering is the same: counters at the HttpMessageInvoker layer vs counters at an inner handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we can take consume and build onto the metrics this component is going to emit without incurring additional perf overhead.

What perf overhead are you worried about @dpk83?

While I don't think R9 strictly has to use this counter for correct behavior, from the consumer's perspective I'd really hope to describe the R9 stuff to 3P devs as an enhancement and not another orthogonal telemetry mechanism. I am leaving out here 1P back-compat where we probably want a different schema. For example I don't think it would be good if 3P devs see duplicate measurements. That would mean that R9 either needs to add tags to this instrument or suppress the instrument and replicate it, including the new enrichment tags. Making sure an alternate implementation stays in-sync feels possible, but error-prone and more work intensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, will the same metrics support be added for net framework or is it limited to net core only?

I think this got answered during our meeting earlier, but reproducing it here - the new metrics would only be present in .NET 8 onwards. This code doesn't flow downlevel either to framework or to older versions of .NET Core.

private Meter? _meter;
private HttpMetrics? _metrics;

#pragma warning disable CS3003 // Type is not CLS-compliant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to use [CLSCompliant(false)] instead of these pragmas

public Meter Meter
{
// TODO: Should the Meter and HttpMetrics be static and shared by default?
get => _meter ??= new Meter("System.Net.Http");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make sure only one instance is ever created in a race.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the property isn't thread-safe, but how much do we care about thread safety? It should only be set when the message invoker is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I read it this field may be null when you start sending requests, and the SendAsync will lazy-init it, no?

If you try sending multiple requests at the same time, you will end up creating more than one instance with the ??=.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go farther, we should use a static so there is only one instance (in the non-DI case). Creating a new instance for every client adds a bunch of unnecessary overhead.

[MemberNotNull(nameof(_metrics))]
private void EnsureMetrics()
{
_metrics ??= new HttpMetrics(Meter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to Meter, this should guard against using different instances in a race


public void RequestStop(HttpRequestMessage request, HttpResponseMessage? response, Exception? unhandledException, long startTimestamp, long currentTimestamp)
{
if (_currentRequests.Enabled || _requestsDuration.Enabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely remember and match the Enabled decision made in RequestStart, otherwise the numbers may be off if you a) start some requests b) enable metrics c) those requests finish.

The existing EventSource counters guard against that (e.g., HttpClient passes a bool telemetryStarted flag to FinishSend).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yesterday I went through the new counters added to ASP.NET Core and added protection for that.

A unit test for this scenario is starting to listen to a meter after an HTTP request has started, then assert that current-requests wasn't decreased by 1.

Comment on lines +39 to +41
#pragma warning disable SA1129 // Do not use default value type constructor
var tags = new TagList();
#pragma warning restore SA1129 // Do not use default value type constructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#pragma warning disable SA1129 // Do not use default value type constructor
var tags = new TagList();
#pragma warning restore SA1129 // Do not use default value type constructor
TagList tags = default;

Does this work to avoid the warning?

Comment on lines +88 to +95
if (requestUri.Scheme is not null)
{
tags.Add("scheme", requestUri.Scheme);
}
if (requestUri.Host is not null)
{
tags.Add("host", requestUri.Host);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (requestUri.Scheme is not null)
{
tags.Add("scheme", requestUri.Scheme);
}
if (requestUri.Host is not null)
{
tags.Add("host", requestUri.Host);
}
tags.Add("scheme", requestUri.Scheme);
tags.Add("host", requestUri.HostNameType == UriHostNameType.IPv6 ? $"[{requestUri.IdnHost}]" : requestUri.IdnHost);

Do you want this to be the IP even if we know the host (the header is present)?

Copy link
Member Author

@JamesNK JamesNK May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. OpenTelemetry has the host tag - called net.peer.name - in its semantic conventions. We should check with them for recommended behavior.

https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/http-metrics/#metric-httpclientduration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your thought around using IdnHost over Host? I'm not well versed on the topic but from what I read in the docs the difference is that IdnHost escapes some characters with Punycode. I would expect handlers of Metric data to respect the full unicode character set and the strings are likely to get exposed to humans via dashboards and queries where escape sequences are more awkward so Host seemed more appropriate. Looking at R9 they've been using Host as the tag value so we've got some real world validation that Host satisfies the goal.

Do you want this to be the IP?

I think want the name whenever it has been provided (which I assume is what Host does).

@dpk83 - You may have considered this in the past, any thoughts on this one?

HttpTelemetry.Log.RequestStart(request);
metrics.RequestStart(request);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this logic does not apply to HttpClient.
If you want to make that work, you need to also instrument StartSend and FinishSend in HttpClient.

Comment on lines 121 to +122
HttpTelemetry.Log.RequestStart(request);
metrics.RequestStart(request);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure these checks exist in HttpTelemetry as well?
That is, change places like RequestStart/Stop in HttpTelemetry to

Interlocked.Increment(ref _startedRequests);

if (IsEnabled(...))
{
    RequestStart(
        request.RequestUri.Scheme,
        request.RequestUri.IdnHost...);
}

The counter fields should continue to be updated, but we don't have to read the Uri properties/serialize the event if the EventSource isn't enabled.

#pragma warning disable SA1129 // Do not use default value type constructor
var tags = new TagList();
#pragma warning restore SA1129 // Do not use default value type constructor
InitializeCommonTags(ref tags, request);
Copy link
Member

@MihaZupan MihaZupan May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Anton that relying on the request message's version can be quite misleading.
YARP for example uses a default version of 2.0 with a RequestVersionOrLower policy.
If we only use the request version, the user would only see 2.0 counters even if they're actually only talking 1.1 to the server.

The unfortunate thing is that we'll only know what version is actually used when we try sending the request to the wire internally, or from a higher-level perspective, once you get back the response message object.

It may make sense to consider moving this logic further down into SocketsHttpHandler, especially if you will also want to track when the response content has been read completely.


_requestsDuration = _meter.CreateHistogram<double>(
"request-duration",
unit: "s",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unit is specified here then changing the unit will mean we will have no option but to create a new instrument as the requirement for us is to use ms. I know currently the unit here is just an indication but we never know how the backend or some exporter might end up using it for.

Copy link
Member

@noahfalk noahfalk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue specifically that the unit was specified? For example if the code didn't specify a unit but still captured measurements in seconds would that solve the problem? My understanding is that it would not.

I think emitting the measurement in seconds (and marking the unit that way) is the right thing for .NET to do because it appears to be the overall approach industry at large is standardizing on. However I also get that not all consumers are aligned behind a single standard and in particular we have MS 1P customers who want ms measurements for compat with their existing practice. I believe we need to invest in techniques that let us present different views of the data to different customer segments. A couple ideas how that might happen:

  1. Creating a new instrument with different naming that proxies a scaled version of the measurements and only exporting data from the new instrument.
  2. Ignoring this instrument and continuing to collect a separate measurement via a different handler or DiagnosticSource
  3. Doing work in OTel support changing the units on an existing instrument before it is reported, perhaps as part of a View
  4. Doing a similar feature as (3) but in the runtime further building on the config and pipeline work

(1) or (2) seem like they might be our shortest-term options but (3) and (4) are probably where I'd hope we end up eventually.

if (response is not null)
{
tags.Add("status-code", (int)response.StatusCode); // Boxing?
tags.Add("protocol", $"HTTP/{response.Version}"); // Hacky
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid allocation. We wouldn't want our services to pay the allocation cost for something which our library will anyways drop and not emit. Protocol is one of them

}
if (request.HasTags)
{
foreach (var customTag in request.MetricsTags)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, will the same metrics support be added for net framework or is it limited to net core only?

}
if (request.HasTags)
{
foreach (var customTag in request.MetricsTags)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few points:

  • Quite often caller doesn't have access to the HttpRequestMessage, third party library is a very good example of this scenario. In complex organizations where there are hundreds and thousands of services a common libraries are used quite a lot and this option just doesn't work. We devised IDownstreamDependencyMetadata for that very purpose.

  • We are now supporting both DelegatingHandler based auto collection of metrics as well as DiagnosticsSource based approach because DelegatingHandler wasn't sufficient for a lot of cases.

  • In order to consume this metrics directly emitted from .NET and provide the transformation and the dependency metadata looks like we will still continue to use DelegatingHandler and DiagnosticsSource so I don't see how we can take consume and build onto the metrics this component is going to emit without incurring additional perf overhead.

  • Is this support is going to be ported back for .NET Framework?

}
if (unhandledException is not null)
{
tags.Add("exception-name", unhandledException.GetType().FullName);
Copy link
Member

@antonfirsov antonfirsov May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesNK are we sure exception-name is valuable information without further details? It would be either HttpRequestException or TaskCanceledException. Shouldn't we instead push for #76644, and if we can make it for 8.0, log the error codes?

@antonfirsov
Copy link
Member

antonfirsov commented May 18, 2023

@dpk83 @JamesNK @noahfalk I've been doing further analysis and prototyping on this, primarily focused on the request-duration counter, and unfortunately it looks like there is no solution that would be perfect on all fronts.

We have 3 options to implement metrics for HTTP:

  1. In HttpMessageInvoker/HttpClient.
  2. In a separate MetricsHandler within the handler chain.
  3. With code that is tightly coupled to SocketsHttpHandler and System.Net.Http internals.

I'm providing a comparison of these 3 options amongst the following requirements:

(A) Handler/platform independence
(B) Complexity of the implementation
(C) Whether it's possible to do enrichment with info from response headers
(D) Whether the implementation logs a separate HTTP span for each attempt to send a request when doing retries or redirection, as recommended by open-telemetry/opentelemetry-specification#3462. Note this has an impact on the correctness of the tags we are logging. If we are not logging separate spans, then according to open-telemetry/opentelemetry-specification#3462 (comment), the http.url MUST be the originally requested URL meaning that for consistence it is reasonable to also log tags like scheme, protocol before sending the request. In this case, we may end up logging misleading values for protocol and scheme since we won't be able to detect redirects or the outcome of version negotiation.
(E) Whether we are able to log the correct request-duration when HttpCompletionOption.ResponseHeadersRead is used and the response stream is being read after the completion of HttpClient.SendAsync. (Assuming that we don't want to hook the logging calls into HttpResponseMessage / HttpContent disposal, which won't be fool-proof, though it still might be an option.)

The 3 potential solutions would also differ in API shape.

1. Implementing metrics in HttpMessageInvoker

This is essentially the current state of the WIP PR.

(A) ✔️ Completely decoupled from SocketsHttpHandler, would work for Mobile+WASM+WinHttp without any additional work.
(B) ✔️ Simple.
(C) ✔️ Response headers can be used for enrichment.
(D) ❌ Going against the recommendation, a single HTTP span is logged for the entire HttpClient call. The logged tags may be incorrect.
(E) ❌ There is no way to detect response stream completion after HttpClient.SendAsync returns. request-duration may not be correct when HttpCompletionOption.ResponseHeadersRead is used.

API

2. Implementing metrics in a separate MetricsHandler

Prototype: https://github.com/dotnet/runtime/compare/main...antonfirsov:af/httpclient-metrics-MetricsHandler?expand=1

(A) Just like DiagnosticsHandler, the code is coupled with System.Net.Http internals but not SocketsHttpHandler. It's possible to adapt the solution to Mobile and WASM, but not external handlers.
(B) ✔️ Simple
(C) ❌ Response headers can NOT be used for enrichment in a DelegatingHandler. The logging is done at the moment MetricsHandler returns.
(D) ✔️ Logging separate HTTP spans for each attempt. The logged tags are correct.
(E) ❌ There is no way to detect response stream completion after HttpClient.SendAsync returns. request-duration may not be correct when HttpCompletionOption.ResponseHeadersRead is used.

API

3. Implementing metrics in SocketsHttpHandler and internals

Prototype: https://github.com/dotnet/runtime/compare/main...antonfirsov:af/httpclient-metrics-integrated?expand=1

This implements the idea from #85447 (comment), which is only possible in code that is tightly coupled to our internals.

(A) ❌ Tightly coupled to System.Net.Http internals at a level that fully featured reimplementation in mobile/wasm handlers is impossible. We can consider additional API-s that would enable proper implementation of all features for any handler.
(B) ❌ Very complicated/hacky implementation. We need a couple of weeks to get it right.
(C) ✔️ Response headers can be inspected for enrichment.
(D) ✔️ Logging separate HTTP spans for each attempt. The logged tags are correct.
(E) ✔️ Can detect response stream completion after HttpClient.SendAsync returns. request-duration is always correct. (Which might be even noisy with long running gRPC requests ...)

API


Personally, I'm in favor of option 2. (MetricsHandler), but not sure how important are the requirements (C) and (E).

@JamesNK
Copy link
Member Author

JamesNK commented May 19, 2023

I'm also in favor of option 2 - MetricsHandler - but I don't want to rule out detecting the end of the response body.

See https://gist.github.com/JamesNK/9cd342b3a547bc2f7963d8de5c18385c as an example of customizing the response HttpContent to detect when it's complete. This allocation heavy: a new HttpContent, HttpContentHeaders, and maybe a new wrapping Stream depending on how you're reading the content.

It might be possible to make this more lightweight with access to internal HttpClient APIs.

This scenario - caring about the response end - has recently come up with gRPC - grpc/grpc-dotnet#2133 cc @MihaZupan. Maybe an API could be added to HttpResponseMessage, which is called when the response is ended (basically, it's triggered on dispose).

For example:

public class HttpResponseMessage
{
    public CancellationToken ResponseEnded { get; }
}

var httpClient = new HttpMessageInvoker(new SocketsHttpHandler());

var response = await httpClient.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://www.google.com"), CancellationToken.None);
response.ResponseEnded.Register(() => Console.WriteLine("Response ended"));

var content = response.Content.ReadAsStringAsync();
Console.WriteLine(content);

API options:

  • CancellationToken might not be the right thing. I used it because it is similar to ASP.NET Core's HttpContext.RequestAborted token.
  • HttpResponseMessage.OnCompleted(Action callback) may be preferable. It's like ASP.NET Core's HttpResponse.OnCompleted.

These would be called when the HttpResponseMessage is disposed (I think that always happens when the response is "done"). If the response is already disposed of when used, the callback runs immediately.

@antonfirsov
Copy link
Member

antonfirsov commented May 19, 2023

What about the requirement (C) - using response headers for enrichment? Even if we manage to hook response end logging into HttpContent and the user code does HttpCompletionOption.ResponseHeadersRead to delay it, it won't help when the content is empty. I assume this is the reason you recommended two criterias in your #85447 (comment) -- note that implementing that needs tight coupling. Are we ok letting that requirement go, or at least accepting that it's unsolvable when the content is empty?

Maybe an API could be added to HttpResponseMessage

Can you open an API proposal for this? Even if it's related implementation-wise, it's a separate feature.

@noahfalk
Copy link
Member

@antonfirsov - You did a really nice job of investigating and presenting the tradeoffs there 👍. Thanks for the thorough treatment!

Personally, I'm in favor of option 2. (MetricsHandler), but not sure how important are the requirements (C) and (E).

My impression is that (C) was a "maybe someone would find it useful if they could do this" but I am not aware we identified any specific case where it was certain to be useful. I give it low weight in our choice.

For (E) I don't believe it was ever a requirement in the past and I am fine to keep it that way. Even if it was trivial to implement such that the duration did capture the entire response for HttpCompletionOption.ResponseHeadersRead, I suspect it would not be what most developers preferred to measure. I have no objection to James' proposal that other scenarios might benefit from observing the end of the response stream, I just wouldn't propose using it for this metrics duration measurement scenario.

I'm happy with option (2) as well. I am assuming MetricsHandler remains adjacent to DiagnosticsHandler in the handler pipeline, as you currently prototyped it.

If it matters I'd be equally happy if the Meter was inside the DiagnosticsHandler from a scenario perspective, but I assume you specifically chose a new handler because it simplified the implementation.

@JamesNK
Copy link
Member Author

JamesNK commented May 19, 2023

What about the requirement (C) - using response headers for enrichment? Even if we manage to hook response end logging into HttpContent and the user code does HttpCompletionOption.ResponseHeadersRead to delay it, it won't help when the content is empty. I assume this is the reason you recommended two criterias in your #85447 (comment) -- note that implementing that needs tight coupling. Are we ok letting that requirement go, or at least accepting that it's unsolvable when the content is empty?

I assume requirement C is achievable.

Why is it unsolvable if there is no response content? I agree that it's more complicated to require both things to be true, and will require changes outside of a MetricsHandler to support, but we're building this into HttpClient.

  • We know when the response has finished reading (I'm assuming it can be done more efficiently than my gist if there is access to internal API)s.
  • And we know when the response leaves the HttpClient/HttpMessageInvoker.

When both this things have happened a callback that's registered by the MetricsHandler can be triggered.

I'll create an issue for an API that allows you to detect when a HttpResponseMessage is complete. I wonder if the complexity should be built into that API: a response is considered complete when it has finished reading and exited the pipeline API. Then the MetricsHandler would go back to being relatively simple to add.

Edit: HttpResponseMessage complete event proposal - #86502

@JamesNK
Copy link
Member Author

JamesNK commented May 20, 2023

@cakescience I noticed this as well. Prometheus.net exports metrics by combining the meter and counter names. For example, request-duration in the System.Net.Http meter becomes system_net_http_request_duration. Meanwhile, OpenTelemetry has decided to use the counter name and put the meter name as a tag instead.

I created an issue a month ago about it - open-telemetry/opentelemetry-dotnet#4424 - but I don't expect anything to change since it's an agreed-upon OTel standard.

If we leave .NET counters with duplicate names (e.g. the server has current-requests and the client has current-requests) then we'll force people to update their Prometheus queries to filter based on the meter name. I think metrics counter names will have to be uniquely identifiable just by using their name to make them easy to use.

For example:

  • request-duration -> http-client-request-duration
  • current-requests -> http-client-current-requests

@antonfirsov
Copy link
Member

antonfirsov commented May 20, 2023

Regarding #85447 (comment):

Why is it unsolvable if there is no response content?

Regardless if we decide to define an API or to implement EOF detection internally, my main concern is the same as I described in #86502 (comment): the only way to detect an empty stream in order to signal an EOF immediately is to do it from within the underlying platform handler. This means that an important part of the core logic cannot live in a separate MetricsHandler and we are actually doing option (3) from my list of options. Please correct me if I'm wrong or missing something.

@antonfirsov
Copy link
Member

Or maybe we can leverage HttpContent.TryComputeLength()?

@JamesNK JamesNK closed this Jun 6, 2023
@karelz karelz added this to the 8.0.0 milestone Jul 3, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants