-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-include-cleaner] Fix incorrect directory issue for writing files #111375
[clang-include-cleaner] Fix incorrect directory issue for writing files #111375
Conversation
If the current working directory of `clang-include-cleaner` differs from the directory of the input files specified in the compilation database, it doesn't adjust the input file paths properly. As a result, `clang-include-cleaner` either writes files to the wrong directory or fails to write files altogether. This pull request fixes the issue by checking whether the input file path is absolute. If it isn't, the path is concatenated with the directory of the input file specified in the compilation database. Fixes llvm#110843.
@llvm/pr-subscribers-clang-tools-extra Author: Byoungchan Lee (bc-lee) ChangesIf the current working directory of This pull request fixes the issue by checking whether the input file path is absolute. If it isn't, the path is concatenated with the directory of the input file specified in the compilation database. Fixes #110843. Full diff: https://github.com/llvm/llvm-project/pull/111375.diff 1 Files Affected:
diff --git a/clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp b/clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp
index 080099adc9a07a..1cb8b9c4ae99b7 100644
--- a/clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp
+++ b/clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp
@@ -173,11 +173,22 @@ class Action : public clang::ASTFrontendAction {
if (!HTMLReportPath.empty())
writeHTML();
+ // Source File's path relative to compilation database.
llvm::StringRef Path =
SM.getFileEntryRefForID(SM.getMainFileID())->getName();
assert(!Path.empty() && "Main file path not known?");
llvm::StringRef Code = SM.getBufferData(SM.getMainFileID());
+ auto Cwd = getCompilerInstance()
+ .getFileManager()
+ .getVirtualFileSystem()
+ .getCurrentWorkingDirectory();
+ if (Cwd.getError()) {
+ llvm::errs() << "Failed to get current working directory: "
+ << Cwd.getError().message() << "\n";
+ return;
+ }
+
auto Results =
analyze(AST.Roots, PP.MacroReferences, PP.Includes, &PI,
getCompilerInstance().getPreprocessor(), HeaderFilter);
@@ -201,8 +212,17 @@ class Action : public clang::ASTFrontendAction {
}
}
- if (!Results.Missing.empty() || !Results.Unused.empty())
- EditedFiles.try_emplace(Path, Final);
+ if (!Results.Missing.empty() || !Results.Unused.empty()) {
+ auto Sept = llvm::sys::path::get_separator();
+ // Adjust the path to be absolute if it is not.
+ std::string FullPath;
+ if (llvm::sys::path::is_absolute(Path))
+ FullPath = Path.str();
+ else
+ FullPath = Cwd.get() + Sept.str() + Path.str();
+
+ EditedFiles.try_emplace(FullPath, Final);
+ }
}
void writeHTML() {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the patch, and sorry for regressing this.
I think this is also going to regress behavior in some cases (e.g. working directory in compile commands might be virtual for some build systems like bazel).
When include-cleaner is invoked as clang-include-cleaner foo.cc
the user is definitely trying to run this on a file relative to directory in which we invoked include-cleaner, hence we should make sure it's adressed through that path going forward.
Can we instead update logic near the following in main
:
clang::tooling::ClangTool Tool(OptionsParser->getCompilations(),
OptionsParser->getSourcePathList());
instead of passing OptionsParser->getSourcePathList())
, we can turn all of these source paths into absolue paths, using the current-working-dir of the process, and then we don't need to worry about how they make their way into our ast-consumer.
Could you provide a minimal reproducer for this case if possible? Ideally, it should not depend on Google's internal Bazel setup or tools. From what I understand, Bazel itself doesn’t generate compilation databases, and third-party tools adjust the paths of raw compiler invocations to the correct ones. I tested two methods for generating compilation databases for Bazel projects in my demo repository (https://github.com/bc-lee/test-compilation-database), and both methods correctly generated the compilation databases, with no issues traversing over the Bazel build directory. For instance, when I ran kiron1/bazel-compile-commands in my demo repository, it generated the following Click to toggle[
{
"directory": "/home/leebc/.cache/bazel/_bazel_leebc/5ea2deb937e75f0bf29ba4c49931e67d/execroot/_main",
"command": "/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -std=c++14 -MD -MF bazel-out/k8-fastbuild/bin/api/_objs/main/main.pic.d -frandom-seed=bazel-out/k8-fastbuild/bin/api/_objs/main/main.pic.o -fPIC -iquote . -iquote bazel-out/k8-fastbuild/bin -iquote external/bazel_tools -iquote bazel-out/k8-fastbuild/bin/external/bazel_tools -std=c++17 -fno-canonical-system-headers -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -c api/main.cc -o bazel-out/k8-fastbuild/bin/api/_objs/main/main.pic.o",
"file": "api/main.cc",
"output": "bazel-out/k8-fastbuild/bin/api/_objs/main/main.pic.o"
}
] And If there are any other cases that I missed, please let me know. I’ll try to reproduce the issue and fix it. |
in our case this is symlinked to an immutable copy of the source file (to make sure builds don't act weird if you edit some sources while the build is running). hence it'll be a regression. What's the concern about not performing this directly as relative to the process' working directory? That approach doesn't rely on any assumptions about the build system, or filesystem layout hence feels much more robust. |
Some Bazel systems might have a read-only file system in the build directory. In such cases, the output file should be written to the original source file. Adjusted the logic to find the correct output directory based on the compilation database and the current working directory.
It’s unfortunate that we can't apply unified logic for determining the actual output file. It's especially disappointing given that I’m using GN and you’re using Bazel, both of which are build systems from Google. I’ve adjusted the logic to find the correct output file path before the compiler invocation starts. Please take another look and let me know if there's anything else I should consider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the revision!
One concern is, we can't really use getAllCompileCommands()
reliably either, not all compilation databases implement that, and even if they do this is gonna be an overkill. For example for chromium we'll iterate over tens of thousands of compile flags, just to modify a handful of files. I've made some suggestions in line.
Also I think it's easier if we put absolute-paths from CDB into the EditedFiles
and just remap them to absolute paths per tools working directory afterwards when we can.
@@ -305,7 +342,32 @@ int main(int argc, const char **argv) { | |||
} | |||
} | |||
|
|||
clang::tooling::ClangTool Tool(OptionsParser->getCompilations(), | |||
auto &CompilationDatabase = OptionsParser->getCompilations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this can be simplified with something like:
auto VFS = llvm::vfs::getRealFileSystem();
auto &CDB = OptionsParser->getCompilations();
std::map<std::string, std::string> CDBToAbsPaths;
for (auto &Source : OptionsParser->getSourcePathList()) {
llvm::SmallString<256> AbsPath(Source);
if (auto Err = VFS.makeAbsolute(AbsPath)) {
llvm::errs() << "Failed to get absolute path for " << Source << " : " << Err.message() << '\n';
return 1;
}
for(auto Cmd : CDB.getCompilecommands(AbsPath)) {
llvm::SmallString<256> CDBPath(Cmd.Filename);
llvm::sys::fs::make_absolute(Cmd.Directory, CDBPath);
CDBToAbsPaths[CDBPath] = AbsPath;
}
}
....
if (Edit) {
for (const auto &NameAndContent : Factory.editedFiles()) {
llvm::StringRef FileName = NameAndContent.first();
if (auto It = CDBToAbsPaths.find(FileName); It != CDBToAbsPaths.end())
FileName = It->second;
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't simplify it much, but I tried to explain my intent as clearly as possible. However, I feel the code is somewhat verbose.
@@ -173,6 +173,7 @@ class Action : public clang::ASTFrontendAction { | |||
if (!HTMLReportPath.empty()) | |||
writeHTML(); | |||
|
|||
// Source File's path relative to compilation database. | |||
llvm::StringRef Path = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep reporting with abs-paths here, i.e:
llvm::SmallString<256> AbsPath(SM.getFileEntryRefForID(SM.getMainFileID())->getName());
SM.getFileManager().makeAbsolutePath(AbsPath);
...
This way we can recognize these paths in main and perform the mapping accordingly.
- Store the compilation database file as absolute path - Instead of using `llvm::sys::fs::current_path`, use `FileSystem::makeAbsolute` to get the absolute path of various files - Add more comments to explain the intent of the code
✅ With the latest revision this PR passed the C/C++ code formatter. |
// CDBToAbsPaths is a map from the path in the compilation database to the | ||
// writable absolute path of the file. | ||
std::map<std::string, std::string> CDBToAbsPaths; | ||
if (Edit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think it saves much to do this only when Edit
is on, parsing C++ is way slower than anything we can do here over a couple of filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed the check for Edit mode.
if (Cmds.empty()) { | ||
// Try with the original path. | ||
Cmds = CDB.getCompileCommands(Source); | ||
if (Cmds.empty()) { | ||
continue; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fallback isn't necessary, clang-invocation underneath will also use the AbsPath as-is, and it'll skip the file if it couldn't find any compile flags for it.
I think it's better to just fail early in this case, similar to above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I checked, and even if we don't specify a compilation database, clang will create one using the FixedCompilationDatabase
class. So, I added an error and return early if the compilation database can't be found.
// We only need the first command to get the directory. | ||
auto Cmd = Cmds[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, underlying clang-invocation will run the compiler for every compile command we received. hence it isn't enough to just do this for the first command, as each command can have a different (Directory, Filename)
combination. also it doesn't hurt even if we get duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (llvm::sys::path::is_absolute(CDBPath)) { | ||
// If the path in the compilation database is already absolute, we don't | ||
// need to do anything. | ||
CDBToAbsPaths[static_cast<std::string>(CDBPath)] = | ||
static_cast<std::string>(AbsPath); | ||
} else { | ||
auto Sept = llvm::sys::path::get_separator(); | ||
// First, try to find the file based on the compilation database. | ||
llvm::Twine FilePathTwine = Directory + Sept + CDBPath; | ||
llvm::SmallString<256> FilePath; | ||
FilePathTwine.toVector(FilePath); | ||
// Check if it is writable. | ||
if (llvm::sys::fs::access(FilePath, llvm::sys::fs::AccessMode::Write) != | ||
std::error_code()) { | ||
// If not, try to find the file based on the current working | ||
// directory, as some Bazel invocations may not set the compilation | ||
// invocation's filesystem as non-writable. In such cases, we can | ||
// find the file based on the current working directory. | ||
FilePath = Source; | ||
if (auto EC = VFS->makeAbsolute(CDBPath)) { | ||
llvm::errs() << "Failed to get absolute path for " << CDBPath | ||
<< " : " << EC.message() << '\n'; | ||
return 1; | ||
} | ||
if (llvm::sys::fs::access(FilePath, | ||
llvm::sys::fs::AccessMode::Write) != | ||
std::error_code()) { | ||
llvm::errs() << "Failed to find a writable path for " << Source | ||
<< '\n'; | ||
return 1; | ||
} | ||
} | ||
CDBToAbsPaths[static_cast<std::string>(CDBPath)] = | ||
static_cast<std::string>(FilePath); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::sys::fs::make_absolute(Cmd.Directory, CDBPath);
already performs concatenation only if CDBPath
is not an absolute path, you don't re-implement all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (llvm::sys::path::is_absolute(CDBPath)) { | ||
// If the path in the compilation database is already absolute, we don't | ||
// need to do anything. | ||
CDBToAbsPaths[static_cast<std::string>(CDBPath)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prefer std::string(CDBPath)
or CDBPath.str().str()
to static_cast
. (also for other occurences of static_cast
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (llvm::sys::fs::access(FilePath, llvm::sys::fs::AccessMode::Write) != | ||
std::error_code()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see why we need to complicate the logic by checking for this. is there any reasons to not always map to file-path relative to current process?
i am pretty convinced that the user intention is to almost always edit the file path as referred to during invocation, e.g. cd /foo && clang-include-cleaner path/to/foo.cc
is always meant to edit /foo/path/to/foo.cc
no matter what file path we get from CDB.
i can see how this might also work, but I prefer to maintain code that has as few special cases as possible. so unless something is indeed breaking with this simplification, can you please get rid of this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out we don't need to check for this. I removed those checks.
@@ -48,3 +48,13 @@ int x = foo(); | |||
// RUN: clang-include-cleaner -edit --ignore-headers="foobar\.h,foo\.h" %t.cpp -- -I%S/Inputs/ | |||
// RUN: FileCheck --match-full-lines --check-prefix=EDIT2 %s < %t.cpp | |||
// EDIT2-NOT: {{^}}#include "foo.h"{{$}} | |||
|
|||
// RUN: mkdir -p $(dirname %t)/out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use %T
instead of dirname %t
, you can find full list in https://llvm.org/docs/CommandGuide/lit.html#substitutions, same below.
also it's safer to run a rm -rf %T
beforehand to make sure we're starting clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out %T
is also deprecated. I replaced my shell invocation with %t.dir
, as this is the recommended method, as described in https://reviews.llvm.org/D69572. I also slightly modified the test invocation so that it works on Windows as well. (Tested on a local Windows machine.)
@@ -49,12 +49,15 @@ int x = foo(); | |||
// RUN: FileCheck --match-full-lines --check-prefix=EDIT2 %s < %t.cpp | |||
// EDIT2-NOT: {{^}}#include "foo.h"{{$}} | |||
|
|||
// This test has commands that rely on shell capabilities that won't execute | |||
// correctly on Windows e.g. subshell execution | |||
// REQUIRES: shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't just affect that particular run, affects the whole, can you move the test into its own file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was explained in the previous comment. It now works on Windows as well.
- Removed the check for Edit mode. - Removed unnecessary fallback. - Various cleanups. - Make the test work on Windows as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for bearing with me, LGTM!
// CDBToAbsPaths is a map from the path in the compilation database to the | ||
// writable absolute path of the file. | ||
std::map<std::string, std::string> CDBToAbsPaths; | ||
// if Edit is enabled, `Factory.editedFiles()` will contain the final code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also drop the mention of Edit
from the comment?
// Try with the original path. | ||
Cmds = CDB.getCompileCommands(Source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fallback is still not needed. if we can't find the flags with absolute path, it's fine to bail out.
for (auto &Source : OptionsParser->getSourcePathList()) { | ||
llvm::SmallString<256> AbsPath(Source); | ||
if (auto Err = VFS->makeAbsolute(AbsPath)) { | ||
llvm::errs() << "Failed to get absolute path for " << Source << " : " | ||
<< Err.message() << '\n'; | ||
return 1; | ||
} | ||
std::vector<clang::tooling::CompileCommand> Cmds = | ||
CDB.getCompileCommands(AbsPath); | ||
if (Cmds.empty()) { | ||
// Try with the original path. | ||
Cmds = CDB.getCompileCommands(Source); | ||
if (Cmds.empty()) { | ||
// It should be found in the compilation database, even user didn't | ||
// specify the compilation database, the `FixedCompilationDatabase` will | ||
// create an entry from the arguments. So it is an error if we can't | ||
// find the compile commands. | ||
llvm::errs() << "No compile commands found for " << Source << '\n'; | ||
return 1; | ||
} | ||
} | ||
for (const auto &Cmd : Cmds) { | ||
llvm::SmallString<256> CDBPath(Cmd.Filename); | ||
std::string Directory(Cmd.Directory); | ||
llvm::sys::fs::make_absolute(Cmd.Directory, CDBPath); | ||
CDBToAbsPaths[std::string(CDBPath)] = std::string(AbsPath); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can extract all of this logic to a helper function:
llvm::Expected<std::map<...>> mapInputsToAbsPaths(...);
...
int main(...) {
...
auto CDBToAbsPaths = mapInputsToAbsPaths(CDB, Opts->OptionsParser->getSourcePathList());
if (!CDBToAbsPaths) {
return 1;
}
...
}
- Removed unnecessary mention of Edit - Removed fallback logic when the absolute path isn't found - Extracted mapInputsToAbsPaths as a separate function
Thanks for the review! |
btw, LMK if you don't have commit access and I should merge this for you |
I don't have commit access to LLVM. Please merge this PR. Thanks! |
I will fix the following error within an hour (currently away from keyboard).
|
…es (llvm#111375) If the current working directory of `clang-include-cleaner` differs from the directory of the input files specified in the compilation database, it doesn't adjust the input file paths properly. As a result, `clang-include-cleaner` either writes files to the wrong directory or fails to write files altogether. This pull request fixes the issue by adjusting the input file paths based on the directory specified in the compilation database. If that directory is not writable, `clang-include-cleaner` will write the output relative to the current working directory. Fixes llvm#110843.
…es (llvm#111375) If the current working directory of `clang-include-cleaner` differs from the directory of the input files specified in the compilation database, it doesn't adjust the input file paths properly. As a result, `clang-include-cleaner` either writes files to the wrong directory or fails to write files altogether. This pull request fixes the issue by adjusting the input file paths based on the directory specified in the compilation database. If that directory is not writable, `clang-include-cleaner` will write the output relative to the current working directory. Fixes llvm#110843.
If the current working directory of
clang-include-cleaner
differs from the directory of the input files specified in the compilation database, it doesn't adjust the input file paths properly. As a result,clang-include-cleaner
either writes files to the wrong directory or fails to write files altogether.This pull request fixes the issue by adjusting the input file paths based on the directory specified in the compilation database. If that directory is not writable,
clang-include-cleaner
will write the output relative to the current working directory.Fixes #110843.