-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query performance issues between 3.9.0 and 3.17.0 #2270
Comments
Thanks for creating this issue @dpiessens! Could you include the diagnostics for a slow query for both 3.8 and 3.17-preview? |
@dpiessens if you can also provide a way to reproduce the issue it would be very helpful to root cause the issue. |
Apologies it was 3.9.0 not 3.8.0. Here's an example of the baseline diagnostics, I did redact URLs and container names. Here's a trace from 3.9.0 {
"DiagnosticVersion": "2",
"Summary": {
"StartUtc": "2021-03-02T14:06:20.4003215Z",
"TotalElapsedTimeInMs": 11.7576,
"UserAgent": "cosmos-netstandard-sdk/3.9.0|3.9.0|30681|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"TotalRequestCount": 2,
"FailedRequestCount": 0
},
"Context": [{
"Id": "CreateQueryPipeline",
"ElapsedTimeInMs": 2.0086
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
"HandlerElapsedTimeInMs": 1.8598
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
"HandlerElapsedTimeInMs": 1.8553
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
"HandlerElapsedTimeInMs": 0.2206
}, {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": [],
"FailedReplicas": []
}, {
"Id": "Microsoft.Azure.Cosmos.GatewayStoreModel",
"ElapsedTimeInMs": 1.7965
}, {
"Id": "GatewayRequestTime",
"ElapsedTimeInMs": 1.6982
}, {
"Id": "PointOperationStatistics",
"ActivityId": null,
"ResponseTimeUtc": "2021-03-02T14:06:20.4022321Z",
"StatusCode": 200,
"SubStatusCode": 0,
"RequestCharge": 0.0,
"RequestUri": "db/reveal/colls/*******",
"RequestSessionToken": null,
"ResponseSessionToken": null
}, {
"PKRangeId": "94",
"StartUtc": "2021-03-02T14:06:20.4023608Z",
"QueryMetric": "totalExecutionTimeInMs=3.33;queryCompileTimeInMs=0.39;queryLogicalPlanBuildTimeInMs=0.11;queryPhysicalPlanBuildTimeInMs=0.37;queryOptimizationTimeInMs=0.02;VMExecutionTimeInMs=2.09;indexLookupTimeInMs=0.89;documentLoadTimeInMs=0.65;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=31;retrievedDocumentSize=245648;outputDocumentCount=31;outputDocumentSize=15401;writeOutputTimeInMs=0.10;indexUtilizationRatio=1.00",
"IndexUtilization": "",
"ClientCorrelationId": "02b10690-70c9-4485-b222-81b5c570298c",
"Context": [{
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": ["https://**********.documents.azure.com:443/"],
"FailedReplicas": []
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
"HandlerElapsedTimeInMs": 9.1108
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
"HandlerElapsedTimeInMs": 9.1067
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
"HandlerElapsedTimeInMs": 0.201
}, {
"Id": "Microsoft.Azure.Documents.ServerStoreModel",
"ElapsedTimeInMs": 9.0283
}, {
"Id": "StoreResponseStatistics",
"StartTimeUtc": "2021-03-02T14:06:20.4024462Z",
"ResponseTimeUtc": "2021-03-02T14:06:20.4113962Z",
"ElapsedTimeInMs": 8.95,
"ResourceType": "Document",
"OperationType": "Query",
"LocationEndpoint": "https://**********.documents.azure.com:443/",
"StoreResult": "StorePhysicalAddress: rntbd:/***********.documents.azure.com:14071/apps/1b020ad4-35b5-405c-817f-54701bf5cd52/services/acd83627-ccde-40f9-8ca4-261837813e6c/partitions/3c3faf19-dcc9-4f08-b65f-41625e86bc59/replicas/132587311508741895s/, LSN: 3711341, GlobalCommittedLsn: 3711340, PartitionKeyRangeId: 94, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 7.41, ItemLSN: -1, SessionToken: -1#3711341, UsingLocalLSN: True, TransportException: null"
}]
}]
} Dumping the trace in 3.17.0-preview gives me this error: System.ArgumentNullException: Value cannot be null. (Parameter 'chars')
at System.Text.UTF8Encoding.GetByteCount(String chars)
at Microsoft.Azure.Cosmos.Json.JsonWriter.WriteStringValue(String value)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceDatumJsonWriter.Visit(PointOperationStatisticsTraceDatum pointOperationStatisticsTraceDatum)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.WriteTraceDatum(IJsonWriter writer, Object value)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Diagnostics.CosmosTraceDiagnostics.WriteTraceToJsonWriter(JsonSerializationFormat jsonSerializationFormat)
at Microsoft.Azure.Cosmos.Diagnostics.CosmosTraceDiagnostics.ToJsonString() |
I was still able to get overall client elapsed time and it's between 22 and 27 ms for 3.17.0 |
I'd like to try to get traces from both first. This code is at the core of a lot of moving parts and would be a multi-day project to create an example that replicates what it does. If the traces don't point to the issue then I'll be happy to create one. |
I'm working on a 3.17 release #2247 which should be done today. This new releases will contain the fix for the ArgumentNullException. |
@dpiessens 3.17.0-preview1 is released. |
Thanks @j82w, I updated to preview 1 and got a trace. This trace executed the same query, but on a different ID so it hit a different server side partition. Let me know if anything obvious stands out that I should tweak: {
"name": "FeedIterator Read Next Async",
"id": "ced01b61-0762-47b2-9f96-dc715c0e6ee0",
"component": "Unknown",
"caller info": {
"member": "ReadNextWithRootTraceAsync",
"file": "FeedIteratorInlineCore.cs",
"line": 48
},
"start time": "09:46:22:240",
"duration in milliseconds": 21.0619,
"data": {},
"children": [{
"name": "Create Query Pipeline",
"id": "a129d9ed-a372-457f-8b70-4e88b4aa1e68",
"component": "Query",
"caller info": {
"member": "TryCreateCoreContextAsync",
"file": "CosmosQueryExecutionContextFactory.cs",
"line": 85
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.5604,
"data": {},
"children": [{
"name": "Get Container Properties",
"id": "efc67f67-3c40-4b18-8df0-35b591d6545e",
"component": "Transport",
"caller info": {
"member": "GetCachedContainerPropertiesAsync",
"file": "ClientContextCore.cs",
"line": 349
},
"start time": "09:46:22:240",
"duration in milliseconds": 0.0119,
"data": {},
"children": [{
"name": "Get Collection Cache",
"id": "4a5fc555-d8c6-4ea5-8da8-0f0f7d24f3ad",
"component": "Routing",
"caller info": {
"member": "GetCollectionCacheAsync",
"file": "DocumentClient.cs",
"line": 542
},
"start time": "09:46:22:240",
"duration in milliseconds": 0.0016,
"data": {},
"children": []
}]
}, {
"name": "Gateway QueryPlan",
"id": "e128ce0f-004f-4377-9530-f0a9771f402f",
"component": "Query",
"caller info": {
"member": "GetQueryPlanThroughGatewayAsync",
"file": "QueryPlanRetriever.cs",
"line": 112
},
"start time": "09:46:22:240",
"duration in milliseconds": 0.4017,
"data": {},
"children": []
}, {
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "4671a334-115e-4af8-9c9c-410717d163cc",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 148
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.3294,
"data": {},
"children": [{
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "88de0897-25ec-4b45-b0a3-91905abd84e6",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 51
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.3144,
"data": {},
"children": [{
"name": "Send Async",
"id": "60ae3d73-4d90-4902-92f1-3d99dda58a0f",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.2835,
"data": {
"CPU Load History": {
"CPU History": "(2021-03-02T21:45:05.7947927Z 100.000), (2021-03-02T21:45:15.6891312Z 100.000), (2021-03-02T21:45:25.6716779Z 97.755), (2021-03-02T21:45:45.6840788Z 100.000), (2021-03-02T21:45:55.7350774Z 95.455), (2021-03-02T21:46:15.7287876Z 100.000)"
}
},
"children": [{
"name": "Send Async",
"id": "f67e1e06-c66e-436d-9a28-9ecca557415b",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.2773,
"data": {},
"children": [{
"name": "Send Async",
"id": "43c0f323-f2fd-42d1-9116-981b905c4b43",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "AbstractRetryHandler.cs",
"line": 23
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.2742,
"data": {},
"children": [{
"name": "Send Async",
"id": "5784f620-b9b3-44f5-a925-874f6eb87d27",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.2653,
"data": {},
"children": [{
"name": "Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request",
"id": "cc8582f6-21b3-426d-8717-204ba853c431",
"component": "Transport",
"caller info": {
"member": "ProcessMessageAsync",
"file": "TransportHandler.cs",
"line": 87
},
"start time": "09:46:22:240",
"duration in milliseconds": 2.2059,
"data": {
"User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"Client Side Request Stats": {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": [],
"FailedReplicas": [],
"AddressResolutionStatistics": [],
"StoreResponseStatistics": []
},
"PointOperationStatisticsTraceDatum": {
"Id": "PointOperationStatistics",
"ActivityId": null,
"ResponseTimeUtc": "2021-03-02T21:46:22.2427593Z",
"StatusCode": 200,
"SubStatusCode": 0,
"RequestCharge": 0,
"RequestUri": "dbs/reveal/colls/*******",
"ErrorMessage": null,
"RequestSessionToken": null,
"ResponseSessionToken": null
}
},
"children": []
}]
}]
}]
}]
}]
}]
}, {
"name": "Get Partition Key Ranges",
"id": "561ec0c8-122f-48df-a3dc-591f5c3ad5ad",
"component": "Routing",
"caller info": {
"member": "GetTargetPartitionKeyRangesAsync",
"file": "CosmosQueryClientCore.cs",
"line": 240
},
"start time": "09:46:22:242",
"duration in milliseconds": 0.032,
"data": {},
"children": []
}]
}, {
"name": "MoveNextAsync",
"id": "03a8f150-9a88-47bc-9a75-f16f1b6c5d64",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "CrossPartitionRangePageAsyncEnumerator.cs",
"line": 114
},
"start time": "09:46:22:242",
"duration in milliseconds": 18.4254,
"data": {},
"children": [{
"name": "Prefetching",
"id": "b9a9499a-3234-43ac-a053-fe0714e4f64b",
"component": "Pagination",
"caller info": {
"member": "PrefetchInParallelAsync",
"file": "ParallelPrefetch.cs",
"line": 31
},
"start time": "09:46:22:242",
"duration in milliseconds": 18.3498,
"data": {},
"children": [{
"name": "Prefetch",
"id": "1660d23a-ab1f-4c64-88aa-704c97b92904",
"component": "Pagination",
"caller info": {
"member": "PrefetchAsync",
"file": "BufferedPartitionRangePageAsyncEnumerator.cs",
"line": 54
},
"start time": "09:46:22:242",
"duration in milliseconds": 18.3004,
"data": {},
"children": [{
"name": "[05C1E7FFFFFFFE,05C1E93F5D0DD0) move next",
"id": "39837de7-a964-47d5-b13e-872b81afb6b4",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "PartitionRangePageAsyncEnumerator.cs",
"line": 49
},
"start time": "09:46:22:242",
"duration in milliseconds": 18.2933,
"data": {
"Query Metrics": "Retrieved Document Count : 9 Retrieved Document Size : 72,156 bytes Output Document Count : 9 Output Document Size : 5,767 bytes Index Utilization : 100.00 % Total Query Execution Time : 11.24 milliseconds Query Preparation Time : 1.03 milliseconds Index Lookup Time : 1.90 milliseconds Document Load Time : 7.65 milliseconds Runtime Execution Times : 0.00 milliseconds Document Write Time : 0.05 milliseconds Index Utilization Information Utilized Single Indexes Potential Single Indexes Utilized Composite Indexes Potential Composite Indexes"
},
"children": [{
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "440227a4-d376-4da0-b8b4-aec6900df583",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 148
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.9901,
"data": {},
"children": [{
"name": "Get Collection Cache",
"id": "077d3811-e1b1-4ae3-aab3-e393d6517fb2",
"component": "Routing",
"caller info": {
"member": "GetCollectionCacheAsync",
"file": "DocumentClient.cs",
"line": 542
},
"start time": "09:46:22:243",
"duration in milliseconds": 0.0009,
"data": {},
"children": []
}, {
"name": "Try Get Overlapping Ranges",
"id": "fcc9baac-7078-49fa-9cde-2af2c3e23ffb",
"component": "Routing",
"caller info": {
"member": "TryGetOverlappingRangesAsync",
"file": "PartitionKeyRangeCache.cs",
"line": 49
},
"start time": "09:46:22:243",
"duration in milliseconds": 0.0098,
"data": {},
"children": []
}, {
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "5d2d935a-a3e0-49e0-b52f-fb114cd80322",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 51
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.9389,
"data": {},
"children": [{
"name": "Send Async",
"id": "b3fedb6d-806e-407f-9d13-169fec787e28",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.9299,
"data": {
"CPU Load History": {
"CPU History": "(2021-03-02T21:45:05.7947927Z 100.000), (2021-03-02T21:45:15.6891312Z 100.000), (2021-03-02T21:45:25.6716779Z 97.755), (2021-03-02T21:45:45.6840788Z 100.000), (2021-03-02T21:45:55.7350774Z 95.455), (2021-03-02T21:46:15.7287876Z 100.000)"
}
},
"children": [{
"name": "Send Async",
"id": "6e2b26e0-8576-476f-8f94-8871bd7e2d8d",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.926,
"data": {},
"children": [{
"name": "Send Async",
"id": "3f7b1211-c41d-466b-8056-068630dcb85a",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "AbstractRetryHandler.cs",
"line": 23
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.9233,
"data": {},
"children": [{
"name": "Send Async",
"id": "9ef57489-b1cd-4a7f-b3cc-13c2534891a0",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.9173,
"data": {},
"children": [{
"name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
"id": "decc7268-c234-46f0-b0da-54f59aa38ff7",
"component": "Transport",
"caller info": {
"member": "ProcessMessageAsync",
"file": "TransportHandler.cs",
"line": 87
},
"start time": "09:46:22:243",
"duration in milliseconds": 17.8771,
"data": {
"User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"Client Side Request Stats": {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": ["https://**********.documents.azure.com/"],
"FailedReplicas": [],
"AddressResolutionStatistics": [],
"StoreResponseStatistics": [{
"ResponseTimeUTC": "2021-03-02T21:46:22.2609143Z",
"ResourceType": "Document",
"OperationType": "Query",
"LocationEndpoint": "https://**********.documents.azure.com/",
"StoreResult": "StorePhysicalAddress: rntbd://**********.documents.azure.com:14198/apps/1072af28-7265-4a2c-9208-a3ae5332dec2/services/fa4dd569-7f60-4935-9ade-412d6ff2de42/partitions/01123852-8264-4de5-8c62-07fa08e93cb4/replicas/132460929699733530s/, LSN: 3697338, GlobalCommittedLsn: 3697337, PartitionKeyRangeId: 117, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 5.76, ItemLSN: -1, SessionToken: -1#3697338, UsingLocalLSN: True, TransportException: null"
}]
}
},
"children": []
}]
}]
}]
}]
}]
}]
}, {
"name": "Get Cosmos Element Response",
"id": "62a0895c-100c-4bc1-ad5e-dd91c8b9051a",
"component": "Json",
"caller info": {
"member": "GetCosmosElementResponse",
"file": "CosmosQueryClientCore.cs",
"line": 282
},
"start time": "09:46:22:261",
"duration in milliseconds": 0.2346,
"data": {},
"children": []
}]
}]
}]
}, {
"name": "[05C1E7FFFFFFFE,05C1E93F5D0DD0) move next",
"id": "c9eb1392-ee93-4211-8b36-93fa4cfa50c3",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "PartitionRangePageAsyncEnumerator.cs",
"line": 49
},
"start time": "09:46:22:261",
"duration in milliseconds": 0.007,
"data": {},
"children": [{
"name": "Prefetch",
"id": "4c087252-3717-4a2a-adcf-aa17f6aedd59",
"component": "Pagination",
"caller info": {
"member": "PrefetchAsync",
"file": "BufferedPartitionRangePageAsyncEnumerator.cs",
"line": 54
},
"start time": "09:46:22:261",
"duration in milliseconds": 0.0008,
"data": {},
"children": []
}]
}]
}]
} |
@timsander1 can you please take a look? @dpiessens are you doing a paging API? Can you share the code on how you are draining the query. |
Sure here's the code we're using: public virtual async Task<(IEnumerable<T> results, string sessionToken)> ExecuteQueryAsync<T>(IQueryable<T> query, CancellationToken cancellationToken = default(CancellationToken))
{
using (LogContext.PushProperty("operationName", "Query"))
using (LogContext.PushProperty("query", query.ToString()))
{
var results = new List<T>();
string sessionToken = null;
var options = new QueryRequestOptions { MaxConcurrency = -1, MaxBufferedItemCount = -1 };
using (var iterator = this.container.GetItemQueryStreamIterator(query.ToQueryDefinition(), null, options))
{
while (iterator.HasMoreResults)
{
Func<Task<ResponseMessage>> action = () => iterator.ReadNextAsync(cancellationToken);
using var response = await Execute(action, "Query");
using var jsondoc = await JsonDocument.ParseAsync(response.Content, cancellationToken: cancellationToken);
var queryResults = JsonConverter.Deserialize<T>(jsondoc.RootElement.GetProperty("Documents").EnumerateArray());
results.AddRange(queryResults);
sessionToken = response.Headers.Session;
}
}
return (results, sessionToken);
}
} One note on this the We didn't change that part of the code yet so there to limit perf change areas, but we're happy to move that over if you feel the newer query methods have better performance. |
Hi @dpiessens Would it be possible to send two traces for the same query with the same filter? The query metrics are different enough that it's hard to rule this out as a possible reason for the difference. Thanks! |
Yup, will get that to you later today.
…On Wed, Mar 3, 2021, 7:02 PM Tim Sander ***@***.***> wrote:
Hi @dpiessens <https://github.com/dpiessens>
Would it be possible to send two traces for the same query with the same
filter? The query metrics are different enough that it's hard to rule this
out as a possible reason for the difference.
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAP3GMFJUUTJTXBUVWDF2DTB3L35ANCNFSM4YMZ3SPA>
.
|
OK so I ran two full runs and wrote something to compare items down to the query. The result is that query calls on average increased 15 ms. This query was at the average: Baseline {
"DiagnosticVersion": "2",
"Summary": {
"StartUtc": "2021-03-05T03:33:01.2618255Z",
"TotalElapsedTimeInMs": 29.2932,
"UserAgent": "cosmos-netstandard-sdk|3.9.0|3.9.0|61579|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"TotalRequestCount": 2,
"FailedRequestCount": 0
},
"Context": [{
"Id": "CreateQueryPipeline",
"ElapsedTimeInMs": 2.095
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
"HandlerElapsedTimeInMs": 1.8682
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
"HandlerElapsedTimeInMs": 1.8624
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
"HandlerElapsedTimeInMs": 0.2576
}, {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": [],
"FailedReplicas": []
}, {
"Id": "Microsoft.Azure.Cosmos.GatewayStoreModel",
"ElapsedTimeInMs": 1.7582
}, {
"Id": "GatewayRequestTime",
"ElapsedTimeInMs": 1.6436
}, {
"Id": "PointOperationStatistics",
"ActivityId": null,
"ResponseTimeUtc": "2021-03-05T03:33:01.2637739Z",
"StatusCode": 200,
"SubStatusCode": 0,
"RequestCharge": 0.0,
"RequestUri": "dbs/*****/colls/****",
"RequestSessionToken": null,
"ResponseSessionToken": null
}, {
"PKRangeId": "126",
"StartUtc": "2021-03-05T03:33:01.2640741Z",
"QueryMetric": "totalExecutionTimeInMs=20.91;queryCompileTimeInMs=0.43;queryLogicalPlanBuildTimeInMs=0.13;queryPhysicalPlanBuildTimeInMs=0.45;queryOptimizationTimeInMs=0.03;VMExecutionTimeInMs=19.56;indexLookupTimeInMs=1.57;documentLoadTimeInMs=17.29;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=18;retrievedDocumentSize=151748;outputDocumentCount=18;outputDocumentSize=11511;writeOutputTimeInMs=0.24;indexUtilizationRatio=1.00",
"IndexUtilization": "",
"ClientCorrelationId": "2e835ccf-4585-4006-9298-1f453b6b1129",
"Context": [{
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": ["https://**********.documents.azure.com:443/"],
"FailedReplicas": []
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
"HandlerElapsedTimeInMs": 26.7368
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
"HandlerElapsedTimeInMs": 26.7322
}, {
"Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
"HandlerElapsedTimeInMs": 0.2232
}, {
"Id": "Microsoft.Azure.Documents.ServerStoreModel",
"ElapsedTimeInMs": 26.6797
}, {
"Id": "StoreResponseStatistics",
"StartTimeUtc": "2021-03-05T03:33:01.2642095Z",
"ResponseTimeUtc": "2021-03-05T03:33:01.2908277Z",
"ElapsedTimeInMs": 26.6182,
"ResourceType": "Document",
"OperationType": "Query",
"LocationEndpoint": "https://**********.documents.azure.com:443/",
"StoreResult": "StorePhysicalAddress: rntbd://cdb-ms-prod-eastus2-***.documents.azure.com:14440/apps/796f6a08-ae9e-4a5f-8014-6b8972ffefdc/services/b0c7544f-49fc-49d3-ae79-ef9df71474e3/partitions/5b8fb755-4144-4fda-8245-3e6f180ed766/replicas/132588272726853466s/, LSN: 3777274, GlobalCommittedLsn: 3777273, PartitionKeyRangeId: 126, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 6.56, ItemLSN: -1, SessionToken: -1#3777274, UsingLocalLSN: True, TransportException: null"
}]
}]
} Latest {
"name": "FeedIterator Read Next Async",
"id": "4baf1d60-acee-4beb-aa3c-6fd37f277eab",
"component": "Unknown",
"caller info": {
"member": "ReadNextWithRootTraceAsync",
"file": "FeedIteratorInlineCore.cs",
"line": 48
},
"start time": "10:38:54:058",
"duration in milliseconds": 44.3216,
"data": {},
"children": [{
"name": "Create Query Pipeline",
"id": "0279b6ab-cd23-455a-b7d1-837047ed52ed",
"component": "Query",
"caller info": {
"member": "TryCreateCoreContextAsync",
"file": "CosmosQueryExecutionContextFactory.cs",
"line": 85
},
"start time": "10:38:54:058",
"duration in milliseconds": 3.1787,
"data": {},
"children": [{
"name": "Get Container Properties",
"id": "7cc3492a-f7eb-4029-88fa-329be4981076",
"component": "Transport",
"caller info": {
"member": "GetCachedContainerPropertiesAsync",
"file": "ClientContextCore.cs",
"line": 349
},
"start time": "10:38:54:058",
"duration in milliseconds": 0.0092,
"data": {},
"children": [{
"name": "Get Collection Cache",
"id": "c07c8bd1-bcc0-41cd-899d-fdd9e6cfd63d",
"component": "Routing",
"caller info": {
"member": "GetCollectionCacheAsync",
"file": "DocumentClient.cs",
"line": 542
},
"start time": "10:38:54:058",
"duration in milliseconds": 0.001,
"data": {},
"children": []
}]
}, {
"name": "Gateway QueryPlan",
"id": "bce69c10-abc3-44ed-be52-6733d9446317",
"component": "Query",
"caller info": {
"member": "GetQueryPlanThroughGatewayAsync",
"file": "QueryPlanRetriever.cs",
"line": 112
},
"start time": "10:38:54:058",
"duration in milliseconds": 0.2977,
"data": {},
"children": []
}, {
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "f414fc24-a0c2-4f9b-8ab2-8a8594951503",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 148
},
"start time": "10:38:54:058",
"duration in milliseconds": 3.004,
"data": {},
"children": [{
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "8ec8c65c-f356-4337-a84b-26bd0c252023",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 51
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9855,
"data": {},
"children": [{
"name": "Send Async",
"id": "1fc064a3-614d-4b71-afd4-350d32648bc4",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9796,
"data": {
"CPU Load History": {
"CPU History": "(2021-03-03T11:12:09.7821310Z 0.000), (2021-03-03T11:12:19.7807023Z 100.000), (2021-03-03T11:12:29.7818508Z 33.333), (2021-03-03T11:12:39.7828943Z 33.333), (2021-03-03T11:12:49.7805599Z 37.220), (2021-03-03T11:12:59.7806116Z 50.319)"
}
},
"children": [{
"name": "Send Async",
"id": "fc912e4a-fd49-418f-b283-9268b9b5c45c",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9748,
"data": {},
"children": [{
"name": "Send Async",
"id": "94cb9950-5127-47f8-afc6-184ed4f8b093",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "AbstractRetryHandler.cs",
"line": 23
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9717,
"data": {},
"children": [{
"name": "Send Async",
"id": "40709019-a2b2-4bce-981c-393ce763a50b",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9651,
"data": {},
"children": [{
"name": "Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request",
"id": "09e1b5b5-adc9-4533-97e2-99d597ea49b4",
"component": "Transport",
"caller info": {
"member": "ProcessMessageAsync",
"file": "TransportHandler.cs",
"line": 87
},
"start time": "10:38:54:058",
"duration in milliseconds": 2.9294,
"data": {
"User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"Client Side Request Stats": {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": [],
"FailedReplicas": [],
"AddressResolutionStatistics": [],
"StoreResponseStatistics": []
},
"PointOperationStatisticsTraceDatum": {
"Id": "PointOperationStatistics",
"ActivityId": null,
"ResponseTimeUtc": "2021-03-04T22:38:54.0611779Z",
"StatusCode": 200,
"SubStatusCode": 0,
"RequestCharge": 0,
"RequestUri": "dbs/*****/colls/*****",
"ErrorMessage": null,
"RequestSessionToken": null,
"ResponseSessionToken": null
}
},
"children": []
}]
}]
}]
}]
}]
}]
}, {
"name": "Get Partition Key Ranges",
"id": "c4b31db9-7ca4-4489-839c-309187ec6c16",
"component": "Routing",
"caller info": {
"member": "GetTargetPartitionKeyRangesAsync",
"file": "CosmosQueryClientCore.cs",
"line": 240
},
"start time": "10:38:54:061",
"duration in milliseconds": 0.0212,
"data": {},
"children": []
}]
}, {
"name": "MoveNextAsync",
"id": "540549f1-6217-4d5a-85e9-9f68ced9a622",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "CrossPartitionRangePageAsyncEnumerator.cs",
"line": 114
},
"start time": "10:38:54:061",
"duration in milliseconds": 41.0726,
"data": {},
"children": [{
"name": "Prefetching",
"id": "8796703e-32e5-4d44-a287-f2c40539c435",
"component": "Pagination",
"caller info": {
"member": "PrefetchInParallelAsync",
"file": "ParallelPrefetch.cs",
"line": 31
},
"start time": "10:38:54:061",
"duration in milliseconds": 41.0125,
"data": {},
"children": [{
"name": "Prefetch",
"id": "b18990d6-8eb7-4736-9057-aa3dee27588a",
"component": "Pagination",
"caller info": {
"member": "PrefetchAsync",
"file": "BufferedPartitionRangePageAsyncEnumerator.cs",
"line": 54
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.9627,
"data": {},
"children": [{
"name": "[05C1E5C373C764,05C1E5FFFFFFFE) move next",
"id": "26a19e66-b841-4ca2-a8e4-928ee3df2aeb",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "PartitionRangePageAsyncEnumerator.cs",
"line": 49
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.9557,
"data": {
"Query Metrics": "Retrieved Document Count: 16, Retrieved Document Size: 132, 449 bytes Output Document Count: 16, Output Document Size: 10, 238 bytes, Index Utilization 100.00 % Total Query Execution Time: 21.27 milliseconds, Query Preparation Time: 1.06 milliseconds Index Lookup Time: 1.58 milliseconds Document Load Time: 17.49 milliseconds Runtime Execution Times: 0.00 milliseconds Document Write Time: 0.25 milliseconds Index Utilization Information Utilized Single Indexes Potential Single Indexes Utilized Composite Indexes Potential Composite Indexes"
},
"children": [{
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "65fab615-615d-443f-9d54-a2e0ad983e92",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 148
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.4332,
"data": {},
"children": [{
"name": "Get Collection Cache",
"id": "cce8562a-e318-489e-bfb8-ddbbb6b7585c",
"component": "Routing",
"caller info": {
"member": "GetCollectionCacheAsync",
"file": "DocumentClient.cs",
"line": 542
},
"start time": "10:38:54:061",
"duration in milliseconds": 0.0007,
"data": {},
"children": []
}, {
"name": "Try Get Overlapping Ranges",
"id": "affdc18e-4717-4564-af95-b56d9ba3b5d6",
"component": "Routing",
"caller info": {
"member": "TryGetOverlappingRangesAsync",
"file": "PartitionKeyRangeCache.cs",
"line": 49
},
"start time": "10:38:54:061",
"duration in milliseconds": 0.008,
"data": {},
"children": []
}, {
"name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id": "1f460122-e5db-4354-9473-f96848cf434f",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestInvokerHandler.cs",
"line": 51
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.3982,
"data": {},
"children": [{
"name": "Send Async",
"id": "d5487e60-b44e-4538-bdd0-38f2d0c75151",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.3917,
"data": {
"CPU Load History": {
"CPU History": "(2021-03-03T11:12:09.7821310Z 0.000), (2021-03-03T11:12:19.7807023Z 100.000), (2021-03-03T11:12:29.7818508Z 33.333), (2021-03-03T11:12:39.7828943Z 33.333), (2021-03-03T11:12:49.7805599Z 37.220), (2021-03-03T11:12:59.7806116Z 50.319)"
}
},
"children": [{
"name": "Send Async",
"id": "4bee8637-2b50-4464-b360-aeaf15c49ae7",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.388,
"data": {},
"children": [{
"name": "Send Async",
"id": "7287901b-9dbc-40bc-8602-fd354caf1bfe",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "AbstractRetryHandler.cs",
"line": 23
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.3856,
"data": {},
"children": [{
"name": "Send Async",
"id": "533445a3-6377-4fc1-b0b2-de92f9e5ac61",
"component": "RequestHandler",
"caller info": {
"member": "SendAsync",
"file": "RequestHandler.cs",
"line": 50
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.3797,
"data": {},
"children": [{
"name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
"id": "67a59ec8-2a43-468e-b3e3-3d7a8f338aa7",
"component": "Transport",
"caller info": {
"member": "ProcessMessageAsync",
"file": "TransportHandler.cs",
"line": 87
},
"start time": "10:38:54:061",
"duration in milliseconds": 40.3399,
"data": {
"User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
"Client Side Request Stats": {
"Id": "AggregatedClientSideRequestStatistics",
"ContactedReplicas": [],
"RegionsContacted": ["https://*********.documents.azure.com/"],
"FailedReplicas": [],
"AddressResolutionStatistics": [],
"StoreResponseStatistics": [{
"ResponseTimeUTC": "2021-03-04T22:38:54.1017619Z",
"ResourceType": "Document",
"OperationType": "Query",
"LocationEndpoint": "https://**********.documents.azure.com/",
"StoreResult": "StorePhysicalAddress: rntbd://cdb-ms-prod-eastus2-***.documents.azure.com:****/apps/796f6a08-ae9e-4a5f-8014-6b8972ffefdc/services/b0c7544f-49fc-49d3-ae79-ef9df71474e3/partitions/5b8fb755-4144-4fda-8245-3e6f180ed766/replicas/132587240566165890s/, LSN: 3763194, GlobalCommittedLsn: 3763193, PartitionKeyRangeId: 126, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 6.33, ItemLSN: -1, SessionToken: -1 #3763194, UsingLocalLSN: True, TransportException: null"
}]
}
},
"children": []
}]
}]
}]
}]
}]
}]
}, {
"name": "Get Cosmos Element Response",
"id": "7fa43540-943a-4c62-9a4b-bdfb1ca5c2ba",
"component": "Json",
"caller info": {
"member": "GetCosmosElementResponse",
"file": "CosmosQueryClientCore.cs",
"line": 282
},
"start time": "10:38:54:101",
"duration in milliseconds": 0.4712,
"data": {},
"children": []
}]
}]
}]
}, {
"name": "[05C1E5C373C764,05C1E5FFFFFFFE) move next",
"id": "f08a1c5a-35a8-4bb5-88e4-ef5cbd6b3046",
"component": "Pagination",
"caller info": {
"member": "MoveNextAsync",
"file": "PartitionRangePageAsyncEnumerator.cs",
"line": 49
},
"start time": "10:38:54:102",
"duration in milliseconds": 0.0068,
"data": {},
"children": [{
"name": "Prefetch",
"id": "49b8698e-dd60-49a7-bcb4-7daf2d284a8a",
"component": "Pagination",
"caller info": {
"member": "PrefetchAsync",
"file": "BufferedPartitionRangePageAsyncEnumerator.cs",
"line": 54
},
"start time": "10:38:54:102",
"duration in milliseconds": 0.0006,
"data": {},
"children": []
}]
}]
}]
} It does appear the new version is doing a double read. |
@timsander1 here is a formatted version:
|
Looks like 90 percent of the time is spent over the network:
So moving to the transport team to investigate. |
@bchong95 please verify that the queries are the same, that same amount of data is being sent over the wire, and the same amount of time was spent on the backend. |
For background we are running through AKS with a virtual endpoint on that VNET to cosmos. The environment, files etc. between the calls were as identical as networking can be, The only difference between the two calls is the client library used. |
The queries are different but the output doc size is about the same: 3.9: Query had OutputDocumentCount=18;outputDocumentSize=11511 |
@dpiessens just to confirm the following questions
|
The query is actually the same, the results have to do with the process it's embedded in and the comparison. 3.17 ran first so it was loading the data. When it ran the query the record it was working on wasn't in the DB yet so not in the results. The target being loaded also had another record after this one so that was loaded. 3.9 was then run so it was re-loading the results. This affects the upsert and replace operations but those had no performance changes. Queries for both loads are identical aside from the 2 additional records, but I don't this this can account for the difference. Our process tracks the load, origin record, method name and the query. I made all of these items the key when I lined up comparisons so I'm completely confident the results are from the same query with the same parameters.
|
Just to update the issue based on the email thread. I’ve looked through a several of the diagnostics, and noticed that the call to get the query plans from gateway was taking longer on average on 3.17. Can you try making the following changes and let us know if you see any improvement?
|
So I updated our code with the two performance items you listed, and the lag is reduced. The average delta went from 15ms to 10ms so that is an improvement. I'm sending over the new traces so you can see the difference. This did leave to one question: Given that many of the queries access a single partition and specify that in the query, is there a way for you to add that into the Cosmos methods? Right now we're creating an IQueryable via |
The latest diagnostics provided shows a 2 to 3x increase in 3.17 SDK networking time vs 3.9 SDK time. Testing the SDK versions in an Azure VM showed they were roughly the same networking latency. Can you try running both SDK versions in the same container against the same Cosmos DB endpoint? At this point it seems likely the difference is caused by something in the environment. |
I attempted to put both version in the same container, but while possible it proved to be more of a challenge than the outcome is worth. At this point we will accept the timing for what it is and scale out to return to our previous throughput. |
Describe the bug
We're upgrading from 3.9.0 to 3.17.0-preview on a performance intensive product. We've noticed a 23 ms per call increase in query times between the 2 SDKs.
To Reproduce
Upgraded from 3.9.0 to 3.17.0-preview. Read the release notes on all releases to follow the upgrade notes. Added the following client performance items:
`
.WithConnectionModeDirect()
options.PortReuseMode = PortReuseMode.PrivatePortPool;
options.IdleTcpConnectionTimeout = TimeSpan.FromHours(1);
options.AllowBulkExecution = false;
`
Also converted query code from:
`
query.ToStreamIterator();
to
container.GetItemQueryStreamIterator(query.ToQueryDefinition(), null, new QueryRequestOptions { MaxConcurrency = -1, MaxBufferedItemCount = -1 }))
`
Expected behavior
Performance difference between the two versions to be nominal.
Actual behavior
Ran load tests and observed query durations take 20 ms longer with newer APIs and changes.
Environment summary
SDK Version: 3.17.0-preview
OS Version Linux containers
Additional context
Below is a graph of performance comparisons:
Yellow box is the 3.17.0-preview library upgrade with no code changes,
Green box is 3.9.0 pre-upgrade
Red box is 3.17.0-preview with performance related code changes made.
The text was updated successfully, but these errors were encountered: