Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73287

ToddGrun · 2024-05-01T00:21:30Z

Doing this as I noticed a large (10x) difference in the number of requestcounter and requestduration events in our dashboard. These events go through the system in slightly different fashions, requestcounter goes through standard telemetry calls on disposal whereas requestduration goes through the aggregated telemetry logging.

Both of these are intended to aggregate multiple calls into a single message, but the cadence at which they send telemetry differs. They both flush on project/VS close, but the requestduration method also flushes every 30 minutes.

I've noticed that VS shutdown is now more abrupt than previously, often not giving our disposers a chance to send out telemetry. This is why I believe there is such a large discepency in the telemetry numbers for these methods, when they should be the same. This PR allows for the requestcounter messages to also be sent out every 30 minutes in case the disposal codepath isn't executed.

Doing this as I noticed a large (10x) difference in the number of requestcounter and requestduration events in our dashboard. These events go through the system in slightly different fashions, requestcounter goes through standard telemetry calls on disposal whereas requestduration goes through the aggregated telemetry logging. Both of these are intended to aggregate multiple calls into a single message, but the cadence at which they send telemetry differs. The requestduration method will flush both on project/VS close and every 30 minutes, whereas the requestcounter method only flushes on project/VS close. I've noticed that VS shutdown is now more abrupt than previously, often not giving our disposers a chance to send out telemetry. This is why I believe there is such a large discepency in the telemetry numbers for these methods, when they should be the same. This PR allows for the requestcounter messages to also be sent out every 30 minutes in case the disposal codepath isn't executed.

dibarbet · 2024-05-01T01:09:50Z

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs

-    public static void SetLogProvider(ITelemetryLogProvider logProvider)
+    public static event EventHandler<EventArgs>? Flushed;
+
+    public static void SetLogProvider(ITelemetryLogProvider logProvider, IAsynchronousOperationListener asyncListener)
    {
        s_logProvider = logProvider;


Should this throw if there's already a log provider?

undid as it caused test failures

dibarbet · 2024-05-01T01:13:20Z

src/Features/LanguageServer/Protocol/Handler/RequestTelemetryLogger.cs

+        TelemetryLogging.Flushed -= OnFlushed;
+    }
+
+    private void OnFlushed(object? sender, EventArgs e)


I don't remember if the queries are resilient to multiple events for the same server for the same session. I assume they are since I don't think we drill down into the session in particular, but I can't remember.

The queries that I know don't drill into the session, so they are resilient. If we find some that do use the session, it seems feasible to change them to allow multiple items from a session.

dibarbet · 2024-05-01T01:19:30Z

src/Features/LanguageServer/Protocol/Handler/RequestTelemetryLogger.cs

+    }
+
+    private void OnFlushed(object? sender, EventArgs e)
+    {
        foreach (var kvp in _requestCounters)
        {
            TelemetryLogging.Log(FunctionId.LSP_RequestCounter, KeyValueLogMessage.Create(LogType.Trace, m =>


I was trying to figure out why these didn't use TelemetryLogging.LogAggregated - but its because we're not using a bucket based aggregation here right? We're just logging pure sums.

Wondering if we should have another variant of the LogAggregated that does a sum or something. But maybe a change for a later date.

Yes, that's exactly right. I considered that as part of this, but it seemed like overkill for now. Especially, as I know we have more upcoming work in this area as part of potentially moving towards OTel.

…his case

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs

CyrusNajmabadi · 2024-05-01T05:51:08Z

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs

@@ -112,5 +135,19 @@ public static void LogAggregated(FunctionId functionId, KeyValueLogMessage logMe
    public static void Flush()
    {
        s_logProvider?.Flush();
+
+        Flushed?.Invoke(null, EventArgs.Empty);


not quite sure i get the point of the events.

CyrusNajmabadi · 2024-05-01T05:51:35Z

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs


    public const string KeyName = "Name";
    public const string KeyValue = "Value";
    public const string KeyLanguageName = "LanguageName";
    public const string KeyMetricName = "MetricName";

-    public static void SetLogProvider(ITelemetryLogProvider logProvider)
+    public static event EventHandler<EventArgs>? Flushed;


what is the event for?

RequestTelemetryLogger not only uses TelemetryLogging for aggregated telemetry, but also accumulation it's own counts that it will send to telemetry itself.

RequestTelemetryLogger uses Dispose to both notify this object to Flush and also handles firing the telemetry counts it is accumulating. However, Dispose isn't a reliable mechanism by which to fire telemetry, as the process might be terminated before we get the opportunity to act.

To handle that, the aggregating telemetry code previously fired off telemetry every 30 minutes, using an ABWQ. This PR moves that out to the TelemetryLogging level, but it's still not hooked into the telemetry that fires due to the accounting in RequestTelemetryLogger.

This event allows RequestTelemetryLogger to hook into when that ABWQ derived telemetry firing is happening, and when it does RequestTelemetryLogger can fire off the telemetry that it is accounting for.

sharwell · 2024-05-01T17:43:52Z

src/VisualStudio/Core/Def/Telemetry/Shared/AggregatingTelemetryLogManager.cs

-        Flush();
-
-        return ValueTaskFactory.CompletedTask;
+        return ImmutableInterlocked.GetOrAdd(ref _aggregatingLogs, functionId, functionId => new AggregatingTelemetryLog(_session, functionId, bucketBoundaries));


How often is this called? Consider using a static lambda here to avoid allocations.

called a lot, nice catch!

sharwell · 2024-05-01T17:47:14Z

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs

    {
        s_logProvider = logProvider;
+
+        if (s_postTelemetryQueue is null)


💡 Use a InterlockedOperations.Initialize helper here.

sharwell · 2024-05-01T17:47:44Z

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs

+            if (null == Interlocked.CompareExchange(ref s_postTelemetryQueue, postTelemetryQueue, null))
+            {
+                // We created the work queue in use. Add an item into it to kick things off.
+                s_postTelemetryQueue?.AddWork();


💡 It's fine if this gets called more than once. The work queue will aggregate duplicate requests into a single one.

2) As AddWork batches work together, no need to protect against multiple calls 3) Get rid of closure allocation in commonly called telemetry method

ToddGrun requested a review from dibarbet May 1, 2024 00:21

ToddGrun requested a review from a team as a code owner May 1, 2024 00:21

dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels May 1, 2024

dibarbet approved these changes May 1, 2024

View reviewed changes

ToddGrun added 3 commits April 30, 2024 18:35

Ensure s_logProvider isn't already set

af5271f

cleanup unnecessary field

4145a2b

Revert back the check against s_logProvider being null as tests hit t…

4a45841

…his case

CyrusNajmabadi reviewed May 1, 2024

View reviewed changes

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed May 1, 2024

View reviewed changes

src/Workspaces/Core/Portable/Telemetry/TelemetryLogging.cs Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed May 1, 2024

View reviewed changes

Rename parameter and don't check token in non-useful case

37d041b

sharwell reviewed May 1, 2024

View reviewed changes

1) Use InterlockedOperations.Initialize to simplify

e5d98dd

2) As AddWork batches work together, no need to protect against multiple calls 3) Get rid of closure allocation in commonly called telemetry method

ToddGrun merged commit 21181a7 into dotnet:main May 1, 2024
25 checks passed

dotnet-policy-service bot added this to the Next milestone May 1, 2024

arunchndr mentioned this pull request May 10, 2024

Revert "Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level" #73432

Merged

dotnet-bot mentioned this pull request May 14, 2024

[Automated] PRs inserted in VS build main-34913.43 #73460

Closed

ToddGrun mentioned this pull request May 15, 2024

Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73485

Merged

dotnet-bot mentioned this pull request May 21, 2024

[Automated] PRs inserted in VS build feature.debugger.main-34920.221 #73610

Closed

Cosifne modified the milestones: Next, 17.11 P2 May 28, 2024

dotnet-bot mentioned this pull request Jun 27, 2024

[Automated] PRs inserted in VS build feature.dotnetVS-35026.97 #74176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73287

Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73287

ToddGrun commented May 1, 2024

dibarbet May 1, 2024

ToddGrun May 1, 2024

ToddGrun May 1, 2024

dibarbet May 1, 2024

ToddGrun May 1, 2024

dibarbet May 1, 2024

ToddGrun May 1, 2024

CyrusNajmabadi May 1, 2024

CyrusNajmabadi May 1, 2024

ToddGrun May 1, 2024 •

edited

Loading

sharwell May 1, 2024

ToddGrun May 1, 2024

sharwell May 1, 2024

sharwell May 1, 2024

Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73287

Move AsyncBatchingWorkQueue usage in telemetry to TelemetryLogging level #73287

Conversation

ToddGrun commented May 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToddGrun May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToddGrun May 1, 2024 •

edited

Loading