Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Exec failed due to IOException' error on tree artifact containing symlink to directory when --remote_download_outputs=toplevel #21171

Closed
ljessendk opened this issue Feb 1, 2024 · 6 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) soft-release-blocker Soft release blockers that are nice to have, but shouldn't block the release if it's the last one. team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@ljessendk
Copy link

ljessendk commented Feb 1, 2024

Description of the bug:

When --remote_download_outputs=toplevel (default in Bazel 7) I get an IOException whenever I try to build a target that depends on a rule producing a tree artifact with a symlink to a directory when the outputs of the rule that produces the tree artifact is fetched from cache.

The error I get looks like this:
ERROR: /user/lsje/Lars/toplevel/project_a/BUILD:7:8: Executing genrule //:consume failed: Exec failed due to IOException: /user/lsje/.cache/bazel/_bazel_lsje/46fd854a93d2f0b0c14f3170daa62d94/execroot/_main/bazel-out/k8-fastbuild/bin/symlink_to_directory/symlink_to_dir (No such file or directory)

I can get the above error to disappear if I either:

  • Set --remote_download_outputs=all
  • Build without cache
  • Build when cache is not populated

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

tree_artifact_with_folder_symlink.bzl:

def _tree_artifact_with_folder_symlink(ctx):

    args = ctx.actions.args()

    outdir = ctx.actions.declare_directory(ctx.attr.name)

    ctx.actions.run_shell(
        outputs = ([outdir]),
        use_default_shell_env = True,
        command = "\n".join([
            "mkdir -p {outdir}/dir",
            "touch {outdir}/dir/file",
            "ln -s dir {outdir}/symlink_to_dir",
        ]).format(outdir = outdir.path)
    )

    return [
        DefaultInfo(files = depset([outdir])),
    ]

tree_artifact_with_folder_symlink = rule(
    implementation = _tree_artifact_with_folder_symlink,
)

BUILD:

load(":tree_artifact_with_folder_symlink.bzl", "tree_artifact_with_folder_symlink")

tree_artifact_with_folder_symlink(
    name = "symlink_to_directory",
)

genrule(
    name = "consume",
    srcs = ["symlink_to_directory"],
    outs = ["dummy"],
    cmd = "touch $@",
)

.bazelrc:

build --disk_cache /tmp/cache

Steps to reproduce (with Bazel 7.0.2):

rm -rf /tmp/cache
bazel clean
bazel build :consume
bazel clean
bazel build :consume

Which operating system are you running Bazel on?

Fedora 37

What is the output of bazel info release?

release 7.0.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

#20415 seems to have a similar reproduction example except for the consuming rule.

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Feb 1, 2024
@oquenchil oquenchil added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Feb 6, 2024
@philsc
Copy link
Contributor

philsc commented Feb 14, 2024

Very different reproduction, but the error is identical: aspect-build/rules_js#1412.
Possibly same problem maybe?

Again, not sure what's causing this issue, but it causes flaky builds and means we have to revert the 7.0.2 upgrade back down to 6.2.

@tjgq
Copy link
Contributor

tjgq commented Feb 14, 2024

I haven't forgotten about this, but my plate is too full at the moment. I suspect it might be a simple fix that can still be done in time for 7.1.0. I'll add it as a soft release blocker.

@tjgq tjgq added this to the 7.1.0 release blockers milestone Feb 14, 2024
@iancha1992 iancha1992 added the soft-release-blocker Soft release blockers that are nice to have, but shouldn't block the release if it's the last one. label Feb 14, 2024
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@iancha1992 iancha1992 removed this from the 7.1.0 release blockers milestone Feb 16, 2024
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Feb 19, 2024
As explained in bazelbuild#20418, when a tree artifact contains a symlink to a directory, it is collected as a single TreeFileArtifact with DirectoryArtifactValue metadata. With this change, the symlink is followed and the directory expanded into its contents, which is more incrementally correct and removes a special case that tree artifact consumers would otherwise have to be aware of.

This also addresses bazelbuild#21171, which is due to the metadata for the directory contents not being available when building without the bytes, causing the input Merkle tree builder to fail. (While I could have fixed this by falling back to reading the directory contents from the filesystem, I prefer to abide by the principle that input metadata should be collected before execution; source directories are the other case where this isn't true, which I also regard as a bug.)

Fixes bazelbuild#20418.
Fixes bazelbuild#21171.

PiperOrigin-RevId: 608389141
Change-Id: I956f3f8a4b1bfd279091e179d1cba3cdd0e5019b
github-merge-queue bot pushed a commit that referenced this issue Feb 20, 2024
…actValue. (#21418)

As explained in #20418, when a
tree artifact contains a symlink to a directory, it is collected as a
single TreeFileArtifact with DirectoryArtifactValue metadata. With this
change, the symlink is followed and the directory expanded into its
contents, which is more incrementally correct and removes a special case
that tree artifact consumers would otherwise have to be aware of.

This also addresses #21171,
which is due to the metadata for the directory contents not being
available when building without the bytes, causing the input Merkle tree
builder to fail. (While I could have fixed this by falling back to
reading the directory contents from the filesystem, I prefer to abide by
the principle that input metadata should be collected before execution;
source directories are the other case where this isn't true, which I
also regard as a bug.)

Fixes #20418.
Fixes #21171.

Commit
4247c20

PiperOrigin-RevId: 608389141
Change-Id: I956f3f8a4b1bfd279091e179d1cba3cdd0e5019b

Co-authored-by: Googler <[email protected]>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

@matthewjh
Copy link

@iancha1992 possible to re-open this one? I see the same issue on 7.1.0.

@tjgq
Copy link
Contributor

tjgq commented Mar 13, 2024

@matthewjh Can you provide a repro? There have been a few different issues with symptoms similar to this one. (For what it's worth, the repro provided by the initial post in this thread does not crash with 7.1.0.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) soft-release-blocker Soft release blockers that are nice to have, but shouldn't block the release if it's the last one. team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

8 participants