-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][deps] Load module map file from PCM #66389
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules ChangesLoading the module map path directly from the PCM is faster. Calling `ModuleMap::getModuleMapFileForUniquing()` causes deserialization of multiple `SLocEntries` and calling `ModuleMap::canonicalizeModuleMapPath()` issues an uncached call to `vfs::FileSystem::getRealPath()`. -- Full diff: https://github.com//pull/66389.diff5 Files Affected:
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp index 49c1d8e553a1df6..367e32e132cec87 100644 --- a/clang/lib/Frontend/CompilerInstance.cpp +++ b/clang/lib/Frontend/CompilerInstance.cpp @@ -1338,7 +1338,8 @@ static bool compileModule(CompilerInstance &ImportingInstance, Result = compileModuleImpl( ImportingInstance, ImportLoc, Module->getTopLevelModuleName(), FrontendInputFile(ModuleMapFilePath, IK, +Module->IsSystem), - ModMap.getModuleMapFileForUniquing(Module)->getName(), ModuleFileName); + ModMap.getModuleMapFileForUniquing(Module)->getNameAsRequested(), + ModuleFileName); } else { // FIXME: We only need to fake up an input file here as a way of // transporting the module's directory to the module map parser. We should @@ -1355,7 +1356,7 @@ static bool compileModule(CompilerInstance &ImportingInstance, Result = compileModuleImpl( ImportingInstance, ImportLoc, Module->getTopLevelModuleName(), FrontendInputFile(FakeModuleMapFile, IK, +Module->IsSystem), - ModMap.getModuleMapFileForUniquing(Module)->getName(), + ModMap.getModuleMapFileForUniquing(Module)->getNameAsRequested(), ModuleFileName, [&](CompilerInstance &Instance) { std::unique_ptr<llvm::MemoryBuffer> ModuleMapBuffer = diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp index bee3a484f84bd47..acbac82d28df97e 100644 --- a/clang/lib/Lex/ModuleMap.cpp +++ b/clang/lib/Lex/ModuleMap.cpp @@ -1307,6 +1307,9 @@ void ModuleMap::setInferredModuleAllowedBy(Module *M, std::error_code ModuleMap::canonicalizeModuleMapPath(SmallVectorImpl<char> &Path) { + FileManager &FM = SourceMgr.getFileManager(); + FM.makeAbsolutePath(Path); + StringRef Dir = llvm::sys::path::parent_path({Path.data(), Path.size()}); // Do not canonicalize within the framework; the module map parser expects @@ -1317,8 +1320,7 @@ ModuleMap::canonicalizeModuleMapPath(SmallVectorImpl<char> &Path) { Dir = Parent; } - FileManager &FM = SourceMgr.getFileManager(); - auto DirEntry = FM.getDirectoryRef(Dir.empty() ? "." : Dir); + auto DirEntry = FM.getDirectoryRef(Dir); if (!DirEntry) return llvm::errorToErrorCode(DirEntry.takeError()); diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp index 65bee806d2c5571..348d71d0e6ade98 100644 --- a/clang/lib/Serialization/ASTWriter.cpp +++ b/clang/lib/Serialization/ASTWriter.cpp @@ -1372,11 +1372,15 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, ASTContext &Context, Record.clear(); auto &Map = PP.getHeaderSearchInfo().getModuleMap(); - AddPath(WritingModule->PresumedModuleMapFile.empty() - ? Map.getModuleMapFileForUniquing(WritingModule) - ->getNameAsRequested() - : StringRef(WritingModule->PresumedModuleMapFile), - Record); + + if (!WritingModule->PresumedModuleMapFile.empty()) { + AddPath(WritingModule->PresumedModuleMapFile, Record); + } else { + auto FE = Map.getModuleMapFileForUniquing(WritingModule); + SmallString<128> ModuleMapPath = FE->getNameAsRequested(); + Map.canonicalizeModuleMapPath(ModuleMapPath); + AddPath(ModuleMapPath, Record); + } // Additional module map files. if (auto *AdditionalModMaps = diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp index 5e028c4431fe90f..06359b40f2732e3 100644 --- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp +++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp @@ -482,16 +482,9 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) { MD.ID.ModuleName = M->getFullModuleName(); MD.IsSystem = M->IsSystem; - ModuleMap &ModMapInfo = - MDC.ScanInstance.getPreprocessor().getHeaderSearchInfo().getModuleMap(); - - OptionalFileEntryRef ModuleMap = ModMapInfo.getModuleMapFileForUniquing(M); - - if (ModuleMap) { - SmallString<128> Path = ModuleMap->getNameAsRequested(); - ModMapInfo.canonicalizeModuleMapPath(Path); - MD.ClangModuleMapFile = std::string(Path); - } + assert(!M->PresumedModuleMapFile.empty() && + "Precompiled module missing presumed module map"); + MD.ClangModuleMapFile = M->PresumedModuleMapFile; serialization::ModuleFile *MF = MDC.ScanInstance.getASTReader()->getModuleManager().lookup( @@ -505,7 +498,10 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) { // handle it like normal. With explicitly built modules we don't need // to play VFS tricks, so replace it with the correct module map. if (StringRef(IFI.Filename).endswith("__inferred_module.map")) { - MDC.addFileDep(MD, ModuleMap->getName()); + auto FE = MDC.ScanInstance.getFileManager().getOptionalFileRef( + MD.ClangModuleMapFile); + assert(FE && "Module map of precompiled module still exists"); + MDC.addFileDep(MD, FE->getName()); return; } MDC.addFileDep(MD, IFI.Filename); diff --git a/clang/test/ClangScanDeps/modules-inferred-vfs-mod-map.m b/clang/test/ClangScanDeps/modules-inferred-vfs-mod-map.m new file mode 100644 index 000000000000000..d14779e73a51db3 --- /dev/null +++ b/clang/test/ClangScanDeps/modules-inferred-vfs-mod-map.m @@ -0,0 +1,63 @@ +// This test checks that we report the module map that allowed inferring using +// its on-disk path in file dependencies. + +// RUN: rm -rf %t +// RUN: split-file %s %t + +//--- frameworks/Inferred.framework/Headers/Inferred.h +//--- frameworks/Inferred.framework/Frameworks/Sub.framework/Headers/Sub.h +//--- real/module.modulemap +framework module * {} +//--- vfsoverlay.json.template +{ + "version": 0, + "case-sensitive": "false", + "use-external-names": true, + "roots": [ + { + "contents": [ + { + "external-contents": "DIR/real/module.modulemap", + "name": "module.modulemap", + "type": "file" + } + ], + "name": "DIR/frameworks", + "type": "directory" + } + ] +} +//--- tu.m +#include <Inferred/Inferred.h> + +//--- cdb.json.template +[{ + "directory": "DIR", + "file": "DIR/tu.m", + "command": "clang -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache -F DIR/frameworks -ivfsoverlay DIR/vfsoverlay.json -c DIR/tu.m -o DIR/tu.o" +}] + +// RUN: sed "s|DIR|%/t|g" %t/vfsoverlay.json.template > %t/vfsoverlay.json +// RUN: sed "s|DIR|%/t|g" %t/cdb.json.template > %t/cdb.json +// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json +// RUN: cat %t/result.json | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t + +// CHECK: { +// CHECK-NEXT: "modules": [ +// CHECK-NEXT: { +// CHECK-NEXT: "clang-module-deps": [], +// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/frameworks/module.modulemap", +// CHECK-NEXT: "command-line": [ +// CHECK: ], +// CHECK-NEXT: "context-hash": "{{.*}}", +// CHECK-NEXT: "file-deps": [ +// CHECK-NEXT: "[[PREFIX]]/frameworks/Inferred.framework/Frameworks/Sub.framework/Headers/Sub.h", +// CHECK-NEXT: "[[PREFIX]]/frameworks/Inferred.framework/Headers/Inferred.h", +// CHECK-NEXT: "[[PREFIX]]/real/module.modulemap" +// CHECK-NEXT: ], +// CHECK-NEXT: "name": "Inferred" +// CHECK-NEXT: } +// CHECK-NEXT: ], +// CHECK-NEXT: "translation-units": [ +// CHECK: ] +// CHECK: } |
ModMapInfo.canonicalizeModuleMapPath(Path); | ||
MD.ClangModuleMapFile = std::string(Path); | ||
} | ||
assert(!M->PresumedModuleMapFile.empty() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment in Module.h: /// Only non-empty when building from preprocessed source.
that should probably be deleted.
@@ -1307,6 +1307,9 @@ void ModuleMap::setInferredModuleAllowedBy(Module *M, | |||
|
|||
std::error_code | |||
ModuleMap::canonicalizeModuleMapPath(SmallVectorImpl<char> &Path) { | |||
FileManager &FM = SourceMgr.getFileManager(); | |||
FM.makeAbsolutePath(Path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about this change: I seem to recall some folks are intentionally using relative paths in their invocations in order to be independent of a base path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you concerned because this would make the second hash in implicitly-built PCM path depend on CWD? The other place this now gets used is in the ASTWriter
, but there it gets passed through its AddPath()
, which makes it relative if needed/requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also impact explicit module invocations. I didn't realize the impact on path hash; good point that also seems like a concern. I didn't consider ASTWriter would make it relative, so at least that case wouldn't be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how would this impact explicit module invocations? These can still use relative module map paths, it's just that the compiler will use the absolute canonical path internally (IIUC in an non-observable way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that because the paths used by the scanner is now indirected through the (implicit) pcm that whatever ASTWriter does to make the path relative will apply to the explicit module invocation? Before your current changes the scanner was using canonicalizeModuleMapPath
directly on the path in the invocation, which is what I was naively thinking of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that because the paths used by the scanner is now indirected through the (implicit) pcm that whatever ASTWriter does to make the path relative will apply to the explicit module invocation?
No, because ASTReader::ReadPath()
does the inverse of ASTWriter::AddPath()
.
I think we were both talking about different situations. You mean the explicit module invocations generated by the scanner? The canonical module map path would now be absolute. Previously, we'd generate relative paths for module maps found through a relative search paths, or besides includers that themselves were found through a relative path.
I'd be curious about the use-case for this, can you elaborate? Is it being able to reuse the same invocation in directories with different paths, potentially on different machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical module map path would now be absolute. Previously, we'd generate relative paths for module maps found through a relative search paths, or besides includers that themselves were found through a relative path.
Yes, this change is what I'm concerned about.
I'd be curious about the use-case for this, can you elaborate? Is it being able to reuse the same invocation in directories with different paths, potentially on different machines?
My understanding is that this can be used to make the .pcms and command-lines to build them portable across machines, e.g. for caching builds in tools like bazel. I don't know how much overlap there is between this use case and using the dependency scanner.
Maybe we can flip it around: where exactly do we need an absolute path with your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical module map path would now be absolute. Previously, we'd generate relative paths for module maps found through a relative search paths, or besides includers that themselves were found through a relative path.
Yes, this change is what I'm concerned about.
Sorry, I take that back. (It's been a while since I wrote this patch.) This function (and therefore the scanner) used to return absolute paths even before this patch because it always uses FileManager::getCanonicalName()
.
So the only thing this patch changes is what gets serialized into the PCM file.
When I started canonicalizing the module map path before storing it into the PCM, some tests started unexpectedly failing with ASTWriter
reporting inconsistencies.
When I looked into it, I found it was due to the fact that FileManager::getCanonicalName()
just forwards the path it's given to llvm::vfs::FileSystem::getRealPath()
. The problem is, the VFS might have a different idea as to what the CWD is, which breaks with relative module map paths and -working-directory
(used in tests/Modules/relative-import-path.c
).
Another issue that popped up was that this function calls replace_path_prefix(Path, Dir, CanonicalDir)
. If Dir
is empty, it just slaps the absolute real path for CWD in front of the filename without any separator.
Making the path absolute early on with FileManager
's CWD solved both of those issues. (No usage of VFS's CWD, no empty Dir
.)
So I think given all of that, my concerns are:
- Should we call
FileManager::makeAbsolutePath()
inModuleMap::canonicalizeModuleMapPath()
or inFileManager::getCanonicalName()
? - Is using
ASTWriter::AddPath()
with the real path problematic? It only performs textual prefix match with a directory path that's not necessarily the real path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function (and therefore the scanner) used to return absolute paths even before this patch because it always uses FileManager::getCanonicalName().
Heh, my bad. As the person who wrote that code I should've realized this was already absolute.
Should we call FileManager::makeAbsolutePath() in ModuleMap::canonicalizeModuleMapPath() or in FileManager::getCanonicalName()?
I think the correct behaviour would be that FileManager::getCanonicalName
would use FixupRelativePath
to apply -working-directory
, but not use makeAbsolutePath
since that would be inconsistent with how it handles paths in other APIs. The VFS itself is responsible for resolving CWD in getRealPath
.
That being said, if this causes other issues I'm fine with your current approach in the meantime.
Is using ASTWriter::AddPath() with the real path problematic? It only performs textual prefix match with a directory path that's not necessarily the real path.
I don't know how we avoid that without the same issues that motivated canonicalizing in the first place. We don't want to be compiling separate modules for /a/Foo, /a/foo, /a/b/../foo, /symlink/foo, etc.
Closing. I want to achieve this by fixing the underlying issues:
|
Loading the module map path directly from the PCM is faster. Calling
ModuleMap::getModuleMapFileForUniquing()
causes deserialization of multipleSLocEntries
and callingModuleMap::canonicalizeModuleMapPath()
issues an uncached call tovfs::FileSystem::getRealPath()
.