Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hardlink issue with whiteout deletes in the merge snapshotter. #4516

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

gdlg
Copy link
Contributor

@gdlg gdlg commented Jan 2, 2024

This PR attempts to fix the following error during a merge snapshot:

Dockerfile:17
--------------------
  15 |     # This dummy COPY is important to ensure that a snapshot is taken before RUN is executed. Otherwise the snapshotter is never called.
  16 |
  17 | >>> RUN echo "Hello"
  18 |
--------------------
ERROR: failed to solve: failed to compute cache key: failed to apply diffs: 1 error occurred:
        * failed to handle changes: failed to hardlink during apply: failed to hardlink during apply: link /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/100/fs/file1 /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/100/fs/file2: no such file or directory

in the corresponding Dockerfile:

# syntax=docker/dockerfile:1.4.0

FROM busybox

RUN touch /file2

COPY --link empty-file /file1
# The link is important to make sure that only file2 is part of the base snapshot in mergeSnapshotter.Merge, but file1 is excluded.
# The names are chosen such that differ.overlayChanges is applied to /file1 before /file2; otherwise, everything works as intended.

RUN rm /file1 /file2
# both files must be deleted in the same RUN call

COPY --link --from=busybox /tmp /tmp
# This dummy COPY is important to ensure that a snapshot is taken before RUN is executed. Otherwise the snapshotter is never called.

RUN echo "Hello"

The file /file2 is included in the base snapshot by Merge() in snapshot/merge.go but not /file1. Due to this, when applying changes for /file1 in applyDelete(), the function decides to not create a whiteout for /file1 since it’s not part of the lower dirs from the base snapshot. It then tries to create a whiteout for /file2 (which is part of the lower dirs).

The issue occurs in applyHardlink() in snapshot/diffapply_unix.go which attempts to create a hardlink from /file2 to /file1 whiteouts in the destination layer. This attempt fails because we didn’t create any whiteout for /file1 (as it’s not needed). This hardlink occurs because /file1 and /file2 whiteouts in the upper layers share the same inode (even though they are created in different layers).

I think that it doesn’t make sense to try to create hardlinks between whiteouts in the destination layer because even if they share the same inode in the source, this is not an intended behaviour and there is no real relationship between the whiteouts in source layers. So in this PR, I propose to skip “delete changes” in the list of inodes used to create hardlinks. This causes applyHardlinks to create a cross-snapshot link to the original file (i.e. dest /file2 whiteout to src /file2 whiteout instead of doing dest /file2 to dest /file1).

@jedevc jedevc requested review from sipsma and tonistiigi January 2, 2024 11:57
@gdlg gdlg force-pushed the whiteout-hardlink-fix branch from b5ecae7 to a35951a Compare January 2, 2024 13:20
@sipsma
Copy link
Collaborator

sipsma commented Jan 2, 2024

Thanks for the fix here @gdlg, the change here LGTM. The only other thing is we should add integ test coverage for this case (probably around here).

If you are up for adding that it would be greatly appreciated.

@gdlg
Copy link
Contributor Author

gdlg commented Jan 2, 2024

Thanks @sipsma for the review. I’ll try to write a test but I tried to run the integration tests with ./hack/test integration and I have thousands of failing tests (even on master branch) like this one:

=== FAIL: solver TestJobsIntegration/TestParallelism/worker=containerd-rootless/max-parallelism=unlimited (15.38s)
    jobs_test.go:85:
                Error Trace:    /src/solver/jobs_test.go:85
                                                        /src/util/testutil/integration/run.go:91
                                                        /src/util/testutil/integration/run.go:205
                Error:          "12.459812827s" is not less than "10s"
                Test:           TestJobsIntegration/TestParallelism/worker=containerd-rootless/max-parallelism=unlimited
                Messages:       parallelism hindered

Not sure if that’s expected or I am doing something wrong?

@tonistiigi
Copy link
Member

@gdlg Not sure what is wrong in that case and if it is a specific test, worker or something with your config, but you can run a specific test/pkg or worker with https://github.com/moby/buildkit/blob/master/.github/CONTRIBUTING.md#run-the-unit--and-integration-tests commands.

Whiteouts may share the same inode which cause the snapshotter
to attempt to create a hardlink between two whiteouts in the destination layer.

The problem is that the first whiteout is only created if it is part of the lower layer from the base snapshot.
Otherwise, the whiteout is not created in the destination layer because it is not needed.

If the first whiteout is not created in the destination, attempting to create a hardlink for the second whiteout will fail.

To avoid this issue, this fix disables hardlinks for whiteouts.

Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
@gdlg gdlg force-pushed the whiteout-hardlink-fix branch from a35951a to bc646a6 Compare January 2, 2024 22:31
@gdlg
Copy link
Contributor Author

gdlg commented Jan 2, 2024

I have added a test to mergediff_test.go which reproduces the issue on the master branch:

    sandbox.go:131: failed to apply diffs: 1 error occurred:
    sandbox.go:131:     * failed to handle changes: failed to hardlink during apply: failed to hardlink during apply: link /tmp/bktest_containerd3393105126/root/io.containerd.snapshotter.v1.overlayfs/snapshots/54/fs/unmodifiedDir/deleteFile1 /tmp/bktest_containerd3393105126/root/io.containerd.snapshotter.v1.overlayfs/snapshots/54/fs/unmodifiedDir/deleteFile2: no such file or directory
    sandbox.go:131:
    sandbox.go:131:
    sandbox.go:131: 491 v0.13.0-beta1-258-g686c0ad25.m buildkitd --containerd-worker-gc=false --containerd-worker=true --containerd-worker-addr /tmp/bktest_containerd3393105126/containerd.sock --containerd-worker-labels=org.mobyproject.buildkit.worker.sandbox=true --oci-worker=false --config=/tmp/bktest_config3627173430/buildkitd.toml --root /tmp/bktest_buildkitd3103212136 --addr unix:///tmp/bktest_buildkitd3103212136/buildkitd.sock --debug
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*applier).applyHardlink
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:346
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*applier).Apply
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:244
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*differ).overlayChanges.func1
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:736
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*differ).overlayChanges.Changes.func2
    sandbox.go:131:     /src/util/overlay/overlay_linux.go:224
    sandbox.go:131: path/filepath.walk
    sandbox.go:131:     /usr/local/go/src/path/filepath/path.go:477
    sandbox.go:131: path/filepath.walk
    sandbox.go:131:     /usr/local/go/src/path/filepath/path.go:501
    sandbox.go:131: path/filepath.walk
    sandbox.go:131:     /usr/local/go/src/path/filepath/path.go:501
    sandbox.go:131: path/filepath.Walk
    sandbox.go:131:     /usr/local/go/src/path/filepath/path.go:572
    sandbox.go:131: github.com/moby/buildkit/util/overlay.Changes
    sandbox.go:131:     /src/util/overlay/overlay_linux.go:160
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*differ).overlayChanges
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:690
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*differ).HandleChanges
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:593
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*mergeSnapshotter).diffApply
    sandbox.go:131:     /src/snapshot/diffapply_unix.go:92
    sandbox.go:131: github.com/moby/buildkit/snapshot.(*mergeSnapshotter).Merge
    sandbox.go:131:     /src/snapshot/merge.go:149
    sandbox.go:131: github.com/moby/buildkit/cache.(*immutableRef).unlazyDiffMerge
    sandbox.go:131:     /src/cache/refs.go:1238
    sandbox.go:131: github.com/moby/buildkit/cache.(*immutableRef).unlazy.func1
    sandbox.go:131:     /src/cache/refs.go:1176
    sandbox.go:131: github.com/moby/buildkit/util/flightcontrol.(*call[...]).run
    sandbox.go:131:     /src/util/flightcontrol/flightcontrol.go:121
    sandbox.go:131: sync.(*Once).doSlow
    sandbox.go:131:     /usr/local/go/src/sync/once.go:74
    sandbox.go:131: sync.(*Once).Do
    sandbox.go:131:     /usr/local/go/src/sync/once.go:65
    sandbox.go:131: runtime.goexit
    sandbox.go:131:     /usr/local/go/src/runtime/asm_amd64.s:1650

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is ready, remove the draft label.

@gdlg gdlg marked this pull request as ready for review January 3, 2024 10:07
@gdlg
Copy link
Contributor Author

gdlg commented Jan 3, 2024

I have marked the PR as ready but I wasn’t able to run the full integration test suite on my machine due to lack of resources which causes a lot of timeouts. Is the CI sufficient? Otherwise I could try to get hold of a more powerful machine to run the tests on my side.

@jedevc
Copy link
Member

jedevc commented Jan 3, 2024

Yup @gdlg don't worry, the CI is generally our source of truth for tests - no need to worry about running them all locally, I don't do that for any of my PRs.

@sipsma sipsma merged commit 6b92932 into moby:master Jan 3, 2024
60 checks passed
@gdlg gdlg deleted the whiteout-hardlink-fix branch January 4, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants