-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: new_local_repository doesn't reevaluate glob #6351
Comments
@laszlocsomor there should be only one team label. Is that better for your team or for External dependencies? |
I cannot tell. We have the Windows expertise, the Ext.Deps team has the rule logic expertise. |
I spent some time with #7063 so I'm also investigating this bug now. |
More observation:
|
Found the culprit. It's complex, so I'll write it down in a document and paste here later. |
ReproLet's have two directories, "A" and "B". "A" contains a WORKSPACE and BUILD file, while "B" contains just a dummy file: /src/A/WORKSPACE:
/src/A/BUILD:
/src/B/a.txt:
BugIf we build "//:G" inside /src/A, then g.txt correctly lists external/B/a.txt and nothing else. If we add a new file /src/B/b.txt and rebuild :G, then g.txt should list external/B/a.txt and external/B/b.txt -- and it does on Linux but not on Windows. If we build a third time on Windows without doing anything, g.txt will contain the correct output. Analysisnew_local_repository creates symlinks to every file and directory in its root, e.g. external/B/a.txt is a symlink to /src/B/a.txt. This is done as part of computing the REPOSITORY_DIRECTORY SkyValue, which is done by evaluating the RepositoryDelegatorFunction, which calls into NewLocalRepositoryFunction.fetch, which creates these symlinks. On Linux these are all symlinks indeed, but on Windows only the directory symlinks are really symlinks, the files are copies (because we cannot create file symlinks). On both Linux and Windows, FILE_STATEs and DIRECTORY_LISTING_STATEs of paths under the external directory have special handling -- they all depend on the corresponding REPOSITORY_DIRECTORY. This dependency is established here and here, by the ExternalFilesHelper.maybeHandleExternalFile() --> RepositoryFunction.addExternalFilesDependencies() call chain, and it makes perfect sense, because it is the repository's SkyFunction that populates the external/B directory and creates the symlinks (or files) under it. Furthermore, the REPOSITORY_DIRECTORY depends on the DIRECTORY_LISTING_STATE of /src/B, so on both platforms Skyframe recomputes the REPOSITORY_DIRECTORY after we added /src/B/b.txt and correctly creates the new symlink (copy) under external/B/. This is all good. The difference is that on Linux the GLOB depends on external/B/a.txt while on Windows it doesn't, because on Linux it's a symlink but on Windows it's a regular file. The reason is a performance optimization: GLOB only depends on FILE_STATEs of symlinks and directories but not files, to avoid recomputing the glob when a matched file's contents change, but to dirty the glob when a directory (or symlink to one) is replaced by a file with the same name (which affects recursive globbing). On both Linux and Windows, the GLOB depends on the DIRECTORY_LISTING_STATE of external/B, and on Linux the GLOB also depends on the FILE_STATE of external/B/a.txt (because it's a symlink). This is a big difference: on Linux the GLOB is dirty but on Windows it's not. Upon closer inspection though, we notice that on Linux the GLOB is not recomputed but revalidated in the second build, and its SkyValue does not contain the newly added file -- however, the PACKAGE for "@B//" is recomputed and it does have the new glob contents and there's a new TARGET for @B//:b.txt -- what's going on?! Turns out there's some package reloading logic in PackageFunction that uses Skyframe's dirtiness information to decide which GLOBs to recompute, but doesn't actually recompute the GlobValue. On Linux the GLOB is dirtied in the second build but on Windows it's not, so on Linux we re-glob (and find external/B/b.txt) but on Windows we don't (even though external/B/b.txt is there). Why doesn't the GLOB depend on the DIRECTORY_LISTING_STATE of external/B though? It actually does. So why isn't this DIRECTORY_LISTING_STATE dirtied on Windows? It actually is! The problem is that the DIRECTORY_LISTING_STATE of external/B does not depend on the REPOSITORY_DIRECTORY of @b, because the RepositoryFunction.addExternalFilesDependencies() call doesn't add this dependency if the path is the same as the repository's root path, in order to avoid a dependency cycle. Which is cool for the FILE_STATE of /src/B, but unnecessary for DIRECTORY_LISTING_STATE of it -- that's the bug. If we change RepositoryFunction.addExternalFilesDependencies() to depend on the REPOSITORY_DIRECTORY for DIRECTORY_LISTING_STATEs, even of external/B itself, then everything works correctly on Windows. Becaues then adding the new file to /src/B will dirty the DIRECTORY_LISTING_STATE of /src/B, dirtying the REPOSITORY_DIRECTORY, dirtying the DIRECTORY_LISTING_STATE of /external/B, dirtying the GLOB, so the partial package reloading logic will re-compute that GLOB, even on Windows. Ironically the bug was there on Linux too, just never manifested… because on Linux, external/B always contains symlinks, so the GLOB depends on their FILE_STATEs, and thus depends on the REPOSITORY_DIRECTORY. |
In Skyframe, the DirectoryStateValue of an external repository's path (e.g. [output_root]/external/B for the repository "@b") will now depend on the RepositoryDirectoryValue. This ensures that adding files to the source directory will transitively invalidate this DirectoryStateValue and any GlobValue corresponding to a glob() in the external repository's root package. More info: bazelbuild#6351 (comment) Fixes bazelbuild#6351
Description of the problem / feature request:
If a "new_local_repository" rule with "build_file_content" creates a rule with a glob, and a rule in the main repo depends on this one, then Bazel on Windows won't detect changes of the glob. (It does on Linux.)
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
git clone https://github.com/laszlocsomor/projects
cd bazel/examples/repo_rule_glob_bug/main_repo
bazel build //foo:all
List the contents of "bazel-genfiles/foo/x.txt" and "bazel-genfiles/foo/y.txt": they will contain "external/external_repo/a.txt" and "foo/foo.txt" respectively, as they should.
Now create "b.txt" in the external_repo directory (next to "a.txt") and "foo/bar.txt" (next to "foo.txt"), and rebuild "//foo:all".
If you now list the contents of the output files, "bazel-genfiles/foo/x.txt" is stale, it sill contains only "external/external_repo/a.txt", whereas "bazel-genfiles/foo/y.txt" correctly contains "foo/foo.txt" and "foo/bar.txt".
The problem doesn't seem to manifest when using a "local_repository" rule and having a BUILD file in the external_repo directory.
What operating system are you running Bazel on?
Windows 10
What's the output of
bazel info release
?The text was updated successfully, but these errors were encountered: