From 93d843503e8fa47320054f438cc3f40926b9e08c Mon Sep 17 00:00:00 2001 From: Bruce Dawson Date: Sat, 28 Jan 2017 23:15:34 -0800 Subject: [PATCH] Halve the buffer counts when tracing to circular buffers On some machines (including my awesome 24-core 48-thread Z840 workstation) tracing to circular memory buffers has, for a long time, been virtually useless. If tracing has been running for a while then saving a trace can take many minutes. It should not take longer than 30-60 seconds. The problem has been reported to Microsoft. All I know is that EtwpCopyLogHeader ends up calling ReadFile(1 MiB) hundreds of thousands of times, even though the file being read is only about 600 MiB. And I know that halving the number of buffers seems to help a lot. And I know that you can run bin\metatrace.bat to trace the trace saving process which *never* ceases to amuse me. This is what allowed me to give an awesomely detailed bug report (see the EtwpCopyLogHeader paragraph) and to find that the trace saving process was CPU bound in the kernel in memcpy. Yep, memcpy. That deserves repeating. KernelBase.dll!ReadFile is CPU bound in memcpy for ~99% of the trace saving time. How cool/horrible is that? The maximum kernel/user buffer sizes for circular-buffer memory tracing in UIforETW used to be 600/100 MiB, leading to 700 MiB trace before compression. That is actually larger than desired, in most cases, so reducing this to 300/50 MiB could be good on multiple levels, We will see. I should really add options to the settings dialog to scale the buffer sizes for different scenarios, but not today. Pull requests welcome, as always. --- UIforETW/Support.cpp | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/UIforETW/Support.cpp b/UIforETW/Support.cpp index 3f673b19..b87f7829 100644 --- a/UIforETW/Support.cpp +++ b/UIforETW/Support.cpp @@ -163,8 +163,12 @@ void CUIforETWDlg::TransferSettings(bool saving) // a larger boost. int CUIforETWDlg::BufferCountBoost(int requestCount) const { + // Saving traces from circular buffers in memory seems to be really + // slow on some (dual socket?) machines and the 600 MB on medium to + // large memory machines is excessive anyway - who wants traces that + // big. So, this neatly haves the buffer sizes. if (tracingMode_ == kTracingToMemory) - return requestCount; + return requestCount / 2; int numerator = 1; int denominator = 1;