Skip to content

Commit

Permalink
Record CPU and analyze CPU performance counters using ETW
Browse files Browse the repository at this point in the history
This change adds a batch file that records CPU performance counters on
every context switch, a python script that analyzes the results, and a
test program that abuses the branch predictor to make interesting
results.

Blog post to follow.
  • Loading branch information
Bruce Dawson committed Nov 24, 2016
1 parent 092fa4d commit 3f57bef
Show file tree
Hide file tree
Showing 9 changed files with 462 additions and 0 deletions.
1 change: 1 addition & 0 deletions LabScripts/ETWPMCDemo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pmc_counters_test.txt
30 changes: 30 additions & 0 deletions LabScripts/ETWPMCDemo/ConditionalCount/ConditionalCount.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#include <stdlib.h>
#include <string>
#include <algorithm>

This comment has been minimized.

Copy link
@Trass3r

Trass3r Apr 3, 2017

misses #include <intrin.h>

This comment has been minimized.

Copy link
@randomascii

randomascii Apr 3, 2017

Contributor

Feel free to submit a PR, or I'll get to it eventually.

int sum_array(unsigned char* p, size_t count);

int main(int argc, char* argv[])
{
int64_t start = __rdtsc();
unsigned char buffer[8192];

for (auto& x : buffer)
{
x = (unsigned char)((rand() / 71) & 255);
}

if (argc > 1 && strcmp(argv[1], "-sort") == 0)
std::sort(buffer, buffer + sizeof(buffer));

int64_t mid = __rdtsc();
int total = 0;
for (int i = 0; i < 30000; ++i)
{
total += sum_array(buffer, sizeof(buffer));

This comment has been minimized.

Copy link
@Trass3r

Trass3r Apr 3, 2017

optimized out it seems

This comment has been minimized.

Copy link
@randomascii

randomascii Apr 3, 2017

Contributor

That's odd. My etwpmc_record.bat script looks for the release build and I was able to get good results. The sum_array function was placed in a separate source file and LTCG was disabled in order to prevent the compiler from optimizing it away. I'll need more details.

This comment has been minimized.

Copy link
@Trass3r

Trass3r Apr 4, 2017

Hmm just let it upgrade to 2017, maybe that changed settings.

This comment has been minimized.

Copy link
@randomascii

randomascii Apr 4, 2017

Contributor

I'll try with VS 2017. That would explain the intrin.h issue - they don't include it as aggressively.

If you open an issue regarding this then it will be easier to track fixing it.

}
int64_t end = __rdtsc();
printf("%5.2f MCycles for initialization and %5.2f MCycles for conditional adding.\n", (mid - start) / 1e6, (end - mid) / 1e6);

return 0;
}
28 changes: 28 additions & 0 deletions LabScripts/ETWPMCDemo/ConditionalCount/ConditionalCount.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 14
VisualStudioVersion = 14.0.25420.1
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ConditionalCount", "ConditionalCount.vcxproj", "{A208226C-6D79-4317-B31E-732B6F57BE45}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Debug|x86 = Debug|x86
Release|x64 = Release|x64
Release|x86 = Release|x86
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{A208226C-6D79-4317-B31E-732B6F57BE45}.Debug|x64.ActiveCfg = Debug|x64
{A208226C-6D79-4317-B31E-732B6F57BE45}.Debug|x64.Build.0 = Debug|x64
{A208226C-6D79-4317-B31E-732B6F57BE45}.Debug|x86.ActiveCfg = Debug|Win32
{A208226C-6D79-4317-B31E-732B6F57BE45}.Debug|x86.Build.0 = Debug|Win32
{A208226C-6D79-4317-B31E-732B6F57BE45}.Release|x64.ActiveCfg = Release|x64
{A208226C-6D79-4317-B31E-732B6F57BE45}.Release|x64.Build.0 = Release|x64
{A208226C-6D79-4317-B31E-732B6F57BE45}.Release|x86.ActiveCfg = Release|Win32
{A208226C-6D79-4317-B31E-732B6F57BE45}.Release|x86.Build.0 = Release|Win32
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
154 changes: 154 additions & 0 deletions LabScripts/ETWPMCDemo/ConditionalCount/ConditionalCount.vcxproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="14.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{A208226C-6D79-4317-B31E-732B6F57BE45}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>ConditionalCount</RootNamespace>
<WindowsTargetPlatformVersion>8.1</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v140</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v140</PlatformToolset>
<WholeProgramOptimization>false</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v140</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v140</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="Shared">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<Text Include="ReadMe.txt" />
</ItemGroup>
<ItemGroup>
<ClCompile Include="ConditionalCount.cpp" />
<ClCompile Include="OtherFile.cpp" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="Source Files">
<UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
<Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
</Filter>
<Filter Include="Header Files">
<UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
<Extensions>h;hh;hpp;hxx;hm;inl;inc;xsd</Extensions>
</Filter>
<Filter Include="Resource Files">
<UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
<Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms</Extensions>
</Filter>
</ItemGroup>
<ItemGroup>
<Text Include="ReadMe.txt" />
</ItemGroup>
<ItemGroup>
<ClCompile Include="ConditionalCount.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="OtherFile.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
</Project>
40 changes: 40 additions & 0 deletions LabScripts/ETWPMCDemo/ConditionalCount/OtherFile.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
This exists as a separate source file, and Link Time Code Generation is
disabled, in order to hide information from the optimizer so that it won't
realize that this function is pure and idempotent.
*/

//#define UNWOUND

int sum_array(unsigned char* p, size_t count)
{
int result = 0;
#ifdef UNWOUND
// I expected this to have a higher percentage of mispredicted
// branches because four out of five branches in the loop are
// unpredictable (the loop-end branch is very predictable).
// But this code somehow shows almost zero branch mispredicts,
// and performance that is equivalent to the perfectly predicted
// code. Very odd.
for (size_t i = 0; i < count; i += 4)
{
if (p[i + 0] < 128)
result += p[i + 0];
if (p[i + 1] < 128)
result += p[i + 1];
if (p[i + 2] < 128)
result += p[i + 2];
if (p[i + 3] < 128)
result += p[i + 3];
}
#else
for (size_t i = 0; i < count; i += 1)
{
// This test relies on this being implemented using a conditional
// branch instruction.
if (p[i] < 128)
result += p[i];
}
#endif
return result;
}
4 changes: 4 additions & 0 deletions LabScripts/ETWPMCDemo/ConditionalCount/ReadMe.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
This demonstrates how pre-sorting of an array can dramatically affect the number
of branch mispredicts. The exact behavior is highly dependent on your compiler
and CPU but this can be a good test case for testing CPU performance counters
such as branch mispredict counts.
Loading

0 comments on commit 3f57bef

Please sign in to comment.