-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][modules] Remove preloaded SLocEntries from PCM files #66962
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules ChangesThis commit removes the list of SLocEntry offsets to preload eagerly from PCM files. Commit introducing this functionality (258ae54) doesn't clarify why this would be more performant than the lazy approach used regularly. Currently, the only SLocEntry the reader is supposed to preload is the predefines buffer, but in my experience, it's not actually referenced in most modules, so the time spent deserializing its SLocEntry is wasted. This is especially noticeable in the dependency scanner, where this change brings 4.56% speedup on my benchmark. Full diff: https://github.com/llvm/llvm-project/pull/66962.diff 4 Files Affected:
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
/// for the previous version could still support reading the new
/// version by ignoring new kinds of subblocks), this number
/// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
/// An ID number that refers to an identifier in an AST file.
///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
/// of source-location information.
SOURCE_LOCATION_OFFSETS = 14,
- /// Record code for the set of source location entries
- /// that need to be preloaded by the AST reader.
- ///
- /// This set contains the source location entry for the
- /// predefines buffer and for any file entries that need to be
- /// preloaded.
- SOURCE_LOCATION_PRELOADS = 15,
+ /// ID 15 used to be for source location entry preloads.
/// Record code for the set of ext_vector type names.
EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
/// AST file.
const uint32_t *SLocEntryOffsets = nullptr;
- /// SLocEntries that we're going to preload.
- SmallVector<uint64_t, 4> PreloadSLocEntries;
-
/// Remapping table for source locations in this module.
ContinuousRangeMap<SourceLocation::UIntTy, SourceLocation::IntTy, 2>
SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
case SOURCE_LOCATION_OFFSETS:
case MODULE_OFFSET_MAP:
case SOURCE_MANAGER_LINE_TABLE:
- case SOURCE_LOCATION_PRELOADS:
case PPD_ENTITIES_OFFSETS:
case HEADER_SEARCH_TABLE:
case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
ParseLineTable(F, Record);
break;
- case SOURCE_LOCATION_PRELOADS: {
- // Need to transform from the local view (1-based IDs) to the global view,
- // which is based off F.SLocEntryBaseID.
- if (!F.PreloadSLocEntries.empty())
- return llvm::createStringError(
- std::errc::illegal_byte_sequence,
- "Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
- F.PreloadSLocEntries.swap(Record);
- break;
- }
-
case EXT_VECTOR_DECLS:
for (unsigned I = 0, N = Record.size(); I != N; ++I)
ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef FileName, ModuleKind Type,
for (ImportedModule &M : Loaded) {
ModuleFile &F = *M.Mod;
- // Preload SLocEntries.
- for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
- int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
- // Load it through the SourceManager and don't call ReadSLocEntry()
- // directly because the entry may have already been loaded in which case
- // calling ReadSLocEntry() directly would trigger an assertion in
- // SourceManager.
- SourceMgr.getLoadedSLocEntryByID(Index);
- }
-
// Map the original source file ID into the ID space of the current
// compilation.
if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 65bee806d2c5571..3392243d7ac4ba7 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -838,7 +838,6 @@ void ASTWriter::WriteBlockInfoBlock() {
RECORD(METHOD_POOL);
RECORD(PP_COUNTER_VALUE);
RECORD(SOURCE_LOCATION_OFFSETS);
- RECORD(SOURCE_LOCATION_PRELOADS);
RECORD(EXT_VECTOR_DECLS);
RECORD(UNUSED_FILESCOPED_DECLS);
RECORD(PPD_ENTITIES_OFFSETS);
@@ -2137,7 +2136,6 @@ void ASTWriter::WriteSourceManagerBlock(SourceManager &SourceMgr,
// entry, which is always the same dummy entry.
std::vector<uint32_t> SLocEntryOffsets;
uint64_t SLocEntryOffsetsBase = Stream.GetCurrentBitNo();
- RecordData PreloadSLocs;
SLocEntryOffsets.reserve(SourceMgr.local_sloc_entry_size() - 1);
for (unsigned I = 1, N = SourceMgr.local_sloc_entry_size();
I != N; ++I) {
@@ -2213,9 +2211,6 @@ void ASTWriter::WriteSourceManagerBlock(SourceManager &SourceMgr,
Stream.EmitRecordWithBlob(SLocBufferAbbrv, Record,
StringRef(Name.data(), Name.size() + 1));
EmitBlob = true;
-
- if (Name == "<built-in>")
- PreloadSLocs.push_back(SLocEntryOffsets.size());
}
if (EmitBlob) {
@@ -2277,9 +2272,6 @@ void ASTWriter::WriteSourceManagerBlock(SourceManager &SourceMgr,
Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record,
bytes(SLocEntryOffsets));
}
- // Write the source location entry preloads array, telling the AST
- // reader which source locations entries it should load eagerly.
- Stream.EmitRecord(SOURCE_LOCATION_PRELOADS, PreloadSLocs);
// Write the line table. It depends on remapping working, so it must come
// after the source location offsets.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my experience, it's not actually referenced in most modules
Can you clarify what you looked at here? I assume you checked non-scanner modules as well?
I took a typical Xcode project and chose one TU that happens to transitively depend on 37 modules. I built it with implicit modules and scanned its dependencies while counting all
So saying most was incorrect. The data suggests that importers don't care about the predefines of ~43% of their transitive dependencies. Note that this number is higher when the cache is full, where the percentage only represents requests issued from the main TU. So it makes sense that deserializing predefines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than my nit about the comment
The "Index/annotate-module.m" test fails with this patch. For run line 29, the previous input is:
while the current output is:
This seems to be the stack trace where things go wrong:
Here's my current understanding:
I need to dig a bit deeper to figure out what exactly needs fixing here. |
Ok, the comment below seems to suggests that the built-ins buffers are not considered to be part of any TU: bool SourceManager::isBeforeInTranslationUnit(SourceLocation LHS,
SourceLocation RHS) const {
// ...
if (isInTheSameTranslationUnit(LHS, RHS))
return X;
// If we arrived here, the location is either in a built-ins buffer or
// associated with global inline asm. PR5662 and PR22576 are examples.
// ...
} I changed |
What is the translated location you get here? It's hard for me to tell if the comment in |
When a module is built, its So the fact that we didn't remap the offset of the built-ins buffer meant it was not considered part of the imported module. This made It took me a while to figure out why that is desired. My understanding is that whether one offset is before another can't be just simple comparison of the numeric values. (More details on this are in Here's a little example of how this ends up working with modules:
The This is the reason the binary search in So the special case in @benlangmuir, let me know if this explanation makes sense. Also adding @sam-mccall as a reviewer, since he's worked in this area recently. |
Thanks for laying it out. I also find this easier to understand in your latest changes because the comments in |
Ripping out this feature sounds great, as does fixing the non-translation of builtin buffer locations. My only concern is that this will make isBeforeInTranslationUnit more expensive. A loc being in one of these special "" is the rare case, and the code recognizing them doesn't look fast (which didn't previously matter, as we only reached it rarely). That said, isBeforeInTranslationUnit is slow already, and I don't think it's hot in clang, moreso in tools like clang-tidy whose performance is less-critical. So let's make it right, and then optimize it again if problems arise. |
This commit removes the list of SLocEntry offsets to preload eagerly from PCM files. Commit introducing this functionality (258ae54) doesn't clarify why this would be more performant than the lazy approach used regularly. Currently, the only SLocEntry the reader is supposed to preload is the predefines buffer, but in my experience, it's not actually referenced in most modules, so the time spent deserializing its SLocEntry is wasted. This is especially noticeable in the dependency scanner, where this change brings 4.56% speedup on my benchmark.
I got around investigating "SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp" and found out it uses this pattern:
This means Since I moved up the special-casing of built-ins buffer up in my last commit, it was happening before we got chance to walk up from |
Also, it seems that the code previously assumed that two
I started tracking the |
Turns out some clients are calling |
I don't understand why the tie-breaking code calls the previously visited |
a4b3d78
to
af8b797
Compare
It may not be a hot function in Clang itself, but it is quite hot for some downstream Clang forks. While at Coverity, I had to avoid direct use of it because it was far too slow. Anything that speeds it up would be appreciated. |
That's good to know, thanks! I think in it's current version, this patch shouldn't make that function slower, but I'll verify this after I get a green light on the semantic changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! I find the current implementation much clearer.
The only thing I might quibble about is the "child" vs. "parent" terminology you changed: I think it's fairly ambiguous either way, because the node is the "child" from the perspective of a top-down include hierarchy, but it's the "parent" from the perspective of the bottom-up search. You could maybe change it to IncludedFile or something, but I don't feel very strongly about it. Child is no worse than parent so if you prefer child I don't think you need to change it.
Thanks for your patience!
I understand. My problem with using "parent" here is that we're using "parent" just a couple lines above to describe the opposite relationship:
So I believe framing all the relationships in terms of the top-down include/expansion hierarchy makes more sense than mixing them up with the bottom-up tree walk. |
@sam-mccall Do you have any additional feedback? You might want to check how this PR and #66966 impact performance for you. |
) This commit removes the list of SLocEntry offsets to preload eagerly from PCM files. Commit introducing this functionality (258ae54) doesn't clarify why this would be more performant than the lazy approach used regularly. Currently, the only SLocEntry the reader is supposed to preload is the predefines buffer, but in my experience, it's not actually referenced in most modules, so the time spent deserializing its SLocEntry is wasted. This is especially noticeable in the dependency scanner, where this change brings 4.56% speedup on my benchmark.
This commit removes the list of SLocEntry offsets to preload eagerly from PCM files. Commit introducing this functionality (258ae54) doesn't clarify why this would be more performant than the lazy approach used regularly.
Currently, the only SLocEntry the reader is supposed to preload is the predefines buffer, but in my experience, it's not actually referenced in most modules, so the time spent deserializing its SLocEntry is wasted. This is especially noticeable in the dependency scanner, where this change brings 4.56% speedup on my benchmark.