-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write #66825
Changes from 36 commits
361f8f6
cb0246f
3c6ca04
3780bd9
9255867
8d8a45a
82fac8e
d54d059
ebf4c1d
52143e9
2350d7f
af34929
fcf92af
f5d12b5
17d941c
c000e64
5b09e43
313eb10
abcbc6d
8c725ef
1050a6b
9f01a30
a9deaec
66efde8
9ec0784
0d9abe5
8240524
60cd296
a7633ad
a61c8a4
4dbe23e
000d818
c57422b
df6eadf
c30888c
4a9e6c9
61e8292
fc1d2be
d8967d2
27239f0
0158523
2abdfe3
7ea9217
0a0c364
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,9 @@ inline StringRef getInstrProfValueProfMemOpFuncName() { | |
/// Return the name prefix of variables containing instrumented function names. | ||
inline StringRef getInstrProfNameVarPrefix() { return "__profn_"; } | ||
|
||
/// Return the name prefix of variables containing virtual table profile data. | ||
inline StringRef getInstrProfVTableVarPrefix() { return "__profvt_"; } | ||
|
||
/// Return the name prefix of variables containing per-function control data. | ||
inline StringRef getInstrProfDataVarPrefix() { return "__profd_"; } | ||
|
||
|
@@ -110,6 +113,8 @@ inline StringRef getInstrProfNamesVarName() { | |
return "__llvm_prf_nm"; | ||
} | ||
|
||
inline StringRef getInstrProfVTableNamesVarName() { return "__llvm_prf_vnm"; } | ||
|
||
/// Return the name of a covarage mapping variable (internal linkage) | ||
/// for each instrumented source module. Such variables are allocated | ||
/// in the __llvm_covmap section. | ||
|
@@ -246,6 +251,9 @@ Error collectGlobalObjectNameStrings(ArrayRef<std::string> NameStrs, | |
Error collectPGOFuncNameStrings(ArrayRef<GlobalVariable *> NameVars, | ||
std::string &Result, bool doCompression = true); | ||
|
||
Error collectVTableStrings(ArrayRef<GlobalVariable *> VTables, | ||
std::string &Result, bool doCompression); | ||
|
||
/// Check if INSTR_PROF_RAW_VERSION_VAR is defined. This global is only being | ||
/// set in IR PGO compilation. | ||
bool isIRPGOFlagSet(const Module *M); | ||
|
@@ -288,14 +296,16 @@ inline StringRef getPGOFuncNameMetadataName() { return "PGOFuncName"; } | |
/// Return the PGOFuncName meta data associated with a function. | ||
MDNode *getPGOFuncNameMetadata(const Function &F); | ||
|
||
std::string getPGOName(const GlobalVariable &V, bool InLTO = false); | ||
|
||
/// Create the PGOFuncName meta data if PGOFuncName is different from | ||
/// function's raw name. This should only apply to internal linkage functions | ||
/// declared by users only. | ||
void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName); | ||
|
||
/// Check if we can use Comdat for profile variables. This will eliminate | ||
/// the duplicated profile variables for Comdat functions. | ||
bool needsComdatForCounter(const Function &F, const Module &M); | ||
bool needsComdatForCounter(const GlobalObject &GV, const Module &M); | ||
|
||
/// An enum describing the attributes of an instrumented profile. | ||
enum class InstrProfKind { | ||
|
@@ -429,20 +439,36 @@ uint64_t ComputeHash(StringRef K); | |
class InstrProfSymtab { | ||
public: | ||
using AddrHashMap = std::vector<std::pair<uint64_t, uint64_t>>; | ||
using RangeHashMap = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, it's a little strange to have HashMap in the name and the underlying data structure is a vector. How about something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the other comments point out, I added a struct to replace nested pairs and used |
||
std::vector<std::pair<std::pair<uint64_t, uint64_t>, uint64_t>>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you change the element type to a structure with 3 elements? E.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, and making the struct definition private to class 'InstrProfSymtab'. |
||
|
||
private: | ||
StringRef Data; | ||
uint64_t Address = 0; | ||
// Unique name strings. | ||
// Unique name strings. Used to ensure entries in MD5NameMap (a vector that's | ||
// going to be sorted) has unique MD5 keys in the first place. | ||
StringSet<> NameTab; | ||
// Records the unique virtual table names. This is used by InstrProfWriter to | ||
// write out an on-disk chained hash table of virtual table names. | ||
// InstrProfWriter stores per function profile data (keyed by function names) | ||
// so it doesn't use a StringSet for function names. | ||
StringSet<> VTableNames; | ||
// A map from MD5 keys to function name strings. | ||
std::vector<std::pair<uint64_t, StringRef>> MD5NameMap; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth it to simplify this to use DenseMap too in a separate patch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. i'll send out another patch later. |
||
// A map from MD5 keys to virtual table definitions. Only populated when | ||
// building the Symtab from a module. | ||
std::vector<std::pair<uint64_t, GlobalVariable *>> MD5VTableMap; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed the type to DenseMap. |
||
// A map from MD5 keys to function define. We only populate this map | ||
// when build the Symtab from a Module. | ||
std::vector<std::pair<uint64_t, Function *>> MD5FuncMap; | ||
// A map from function runtime address to function name MD5 hash. | ||
// This map is only populated and used by raw instr profile reader. | ||
AddrHashMap AddrToMD5Map; | ||
// A map from virtual table runtime address to function name MD5 hash. | ||
// This map is only populated and used by raw instr profile reader. | ||
// This is a different map from 'AddrToMD5Map' for readability and | ||
// debuggability. | ||
RangeHashMap VTableAddrRangeToMD5Map; | ||
bool Sorted = false; | ||
|
||
static StringRef getExternalSymbol() { | ||
|
@@ -481,9 +507,19 @@ class InstrProfSymtab { | |
|
||
/// \c NameStrings is a string composed of one of more sub-strings | ||
/// encoded in the format described in \c collectPGOFuncNameStrings. | ||
/// This method is a wrapper to \c readPGOFuncNameStrings method. | ||
/// This method is a wrapper to \c readAndDecodeStrings method. | ||
Error create(StringRef NameStrings); | ||
|
||
/// \c FuncNameStrings is a string composed of one or more encoded function | ||
/// name strings, and \c VTableNameStrings composes of one or more encoded | ||
/// vtable names. This function is a wrapper to \c readAndDecodeStrings | ||
/// method. | ||
Error create(StringRef FuncNameStrings, StringRef VTableNameStrings); | ||
|
||
/// Initialize 'this' with the set of vtable names encoded in | ||
/// \c CompressedVTableNames. | ||
Error initVTableNamesFromCompressedStrings(StringRef CompressedVTableNames); | ||
|
||
/// This interface is used by reader of CoverageMapping test | ||
/// format. | ||
inline Error create(StringRef D, uint64_t BaseAddr); | ||
|
@@ -496,32 +532,70 @@ class InstrProfSymtab { | |
|
||
/// Create InstrProfSymtab from a set of names iteratable from | ||
/// \p IterRange. This interface is used by IndexedProfReader. | ||
template <typename NameIterRange> Error create(const NameIterRange &IterRange); | ||
|
||
/// Update the symtab by adding \p FuncName to the table. This interface | ||
/// is used by the raw and text profile readers. | ||
Error addFuncName(StringRef FuncName) { | ||
if (FuncName.empty()) | ||
template <typename NameIterRange> | ||
Error create(const NameIterRange &IterRange); | ||
|
||
/// Create InstrProfSymtab from a set of function names and vtable | ||
/// names iteratable from \p IterRange. This interface is used by | ||
/// IndexedProfReader. | ||
template <typename FuncNameIterRange, typename VTableNameIterRange> | ||
Error create(const FuncNameIterRange &FuncIterRange, | ||
const VTableNameIterRange &VTableIterRange); | ||
|
||
Error addSymbolName(StringRef SymbolName) { | ||
if (SymbolName.empty()) | ||
return make_error<InstrProfError>(instrprof_error::malformed, | ||
"function name is empty"); | ||
auto Ins = NameTab.insert(FuncName); | ||
"symbol name is empty"); | ||
|
||
// Insert into NameTab so that MD5NameMap (a vector that will be sorted) | ||
// won't have duplicated entries in the first place. | ||
auto Ins = NameTab.insert(SymbolName); | ||
if (Ins.second) { | ||
MD5NameMap.push_back(std::make_pair( | ||
IndexedInstrProf::ComputeHash(FuncName), Ins.first->getKey())); | ||
IndexedInstrProf::ComputeHash(SymbolName), Ins.first->getKey())); | ||
Sorted = false; | ||
} | ||
return Error::success(); | ||
} | ||
|
||
/// The method name is kept since there are many callers. | ||
/// It just forwards to 'addSymbolName'. | ||
Error addFuncName(StringRef FuncName) { return addSymbolName(FuncName); } | ||
|
||
/// Adds VTableName as a known symbol, and inserts it to a map that | ||
/// tracks all vtable names. | ||
Error addVTableName(StringRef VTableName) { | ||
if (Error E = addSymbolName(VTableName)) | ||
return E; | ||
|
||
// Record VTableName. InstrProfWriter uses this map. The comment around | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: "uses this set". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
// class member explains why. | ||
VTableNames.insert(VTableName); | ||
return Error::success(); | ||
} | ||
|
||
const StringSet<> &getVTableNames() const { return VTableNames; } | ||
|
||
/// Map a function address to its name's MD5 hash. This interface | ||
/// is only used by the raw profiler reader. | ||
void mapAddress(uint64_t Addr, uint64_t MD5Val) { | ||
AddrToMD5Map.push_back(std::make_pair(Addr, MD5Val)); | ||
} | ||
|
||
/// Map the address range (i.e., [start_address, end_address)) of a variable | ||
/// to its names' MD5 hash. This interface is only used by the raw profile | ||
/// reader. | ||
void mapVTableAddress(uint64_t StartAddr, uint64_t EndAddr, uint64_t MD5Val) { | ||
VTableAddrRangeToMD5Map.push_back( | ||
std::make_pair(std::make_pair(StartAddr, EndAddr), MD5Val)); | ||
} | ||
|
||
/// Return a function's hash, or 0, if the function isn't in this SymTab. | ||
uint64_t getFunctionHashFromAddress(uint64_t Address); | ||
|
||
/// Return a vtable's hash, or 0 if the vtable doesn't exist in this SymTab. | ||
uint64_t getVTableHashFromAddress(uint64_t Address); | ||
|
||
/// Return function's PGO name from the function name's symbol | ||
/// address in the object file. If an error occurs, return | ||
/// an empty string. | ||
|
@@ -543,6 +617,8 @@ class InstrProfSymtab { | |
|
||
/// Return function from the name's md5 hash. Return nullptr if not found. | ||
inline Function *getFunction(uint64_t FuncMD5Hash); | ||
// Return vtable from the name's MD5 hash. Return nullptr if not found. | ||
inline GlobalVariable *getGlobalVariable(uint64_t GlobalVariableMD5Hash); | ||
|
||
/// Return the name section data. | ||
inline StringRef getNameData() const { return Data; } | ||
|
@@ -567,6 +643,21 @@ Error InstrProfSymtab::create(const NameIterRange &IterRange) { | |
return Error::success(); | ||
} | ||
|
||
template <typename FuncNameIterRange, typename VTableNameIterRange> | ||
Error InstrProfSymtab::create(const FuncNameIterRange &FuncIterRange, | ||
const VTableNameIterRange &VTableIterRange) { | ||
for (auto Name : FuncIterRange) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. const auto& to avoid a copy. Same for the loop below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Spell out 'auto' by using 'StringRef'. |
||
if (Error E = addFuncName(Name)) | ||
return E; | ||
|
||
for (auto VTableName : VTableIterRange) | ||
if (Error E = addVTableName(VTableName)) | ||
return E; | ||
|
||
finalizeSymtab(); | ||
return Error::success(); | ||
} | ||
|
||
void InstrProfSymtab::finalizeSymtab() { | ||
if (Sorted) | ||
return; | ||
|
@@ -575,6 +666,13 @@ void InstrProfSymtab::finalizeSymtab() { | |
llvm::sort(AddrToMD5Map, less_first()); | ||
AddrToMD5Map.erase(std::unique(AddrToMD5Map.begin(), AddrToMD5Map.end()), | ||
AddrToMD5Map.end()); | ||
// VTable object address ranges should not overlap; so sort by either | ||
// beginning address or end address is fine. | ||
llvm::sort(VTableAddrRangeToMD5Map, less_first()); | ||
// std::unique uses == operator for std::pair. | ||
VTableAddrRangeToMD5Map.erase(std::unique(VTableAddrRangeToMD5Map.begin(), | ||
VTableAddrRangeToMD5Map.end()), | ||
VTableAddrRangeToMD5Map.end()); | ||
Sorted = true; | ||
} | ||
|
||
|
@@ -605,6 +703,19 @@ Function* InstrProfSymtab::getFunction(uint64_t FuncMD5Hash) { | |
return nullptr; | ||
} | ||
|
||
GlobalVariable * | ||
InstrProfSymtab::getGlobalVariable(uint64_t GlobalVariableMD5Hash) { | ||
finalizeSymtab(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to finalizeSymtab when looking up a global var? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. With a DenseMap there is no need to call 'finalizeSymtab' (which sorts the vectors for look-up). Done. |
||
auto Result = | ||
llvm::lower_bound(MD5VTableMap, GlobalVariableMD5Hash, | ||
[](const std::pair<uint64_t, GlobalVariable *> &LHS, | ||
uint64_t RHS) { return LHS.first < RHS; }); | ||
|
||
if (Result != MD5VTableMap.end() && Result->first == GlobalVariableMD5Hash) | ||
return Result->second; | ||
return nullptr; | ||
} | ||
|
||
// To store the sums of profile count values, or the percentage of | ||
// the sums of the total count values. | ||
struct CountSumOrPercent { | ||
|
@@ -870,6 +981,8 @@ struct InstrProfRecord { | |
return ValueData->IndirectCallSites; | ||
case IPVK_MemOPSize: | ||
return ValueData->MemOPSizes; | ||
case IPVK_VTableTarget: | ||
return ValueData->VTableTargets; | ||
default: | ||
llvm_unreachable("Unknown value kind!"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we flip this condition and return early to reduce the nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
I also updated the code such that 'IndirectCalls' is maintained iff 'Type' is 'kIndirectCall' and 'ProfiledAddresses' is maintained iff 'Type' is 'kVTableVal'.