-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unused unpacked snapshot left in content store after nerdctl system prune #2372
Comments
It looks to me like the issue when we
The image config has the garbage collection label to remove the snapshot: Additionally, the manifest for the image we just created is missing the |
This is a bug with buildkitv0.11.x . The bug has been patched in moby/buildkit#3972 which was included in buildkitv0.12.0 which was released just yesterday: https://github.com/moby/buildkit/releases/tag/v0.12.0 |
Is there any convenient workaround for this? |
If your specific manifestation of the issue comes from buildkit, as documented by @ginglis13 then you should be fine with buildkit >= v0.12. On the other hand, the same symptoms ( We have patched a number of cases, and we now forcefully ensure content is there on specific operations (save, commit, tag), but there certainly are more cases we have not covered and the underlying problem is still very much here. #3513 has some context Unless you think your issue is exactly the same scenario as reported in this ticket here, I suggest you open a new ticket with clear steps to reproduce and we can look into it. Hope that helps. |
Thanks for the extra info @apostasie !
So far it has happened reproducibly on every Kubespray cluster I try to upgrade (to v2.23.3), specifically with this etcd_v3.5.10 image. I tried pruning images and deleting and re-pulling this one but it didn't help. Maybe the issue is linked with the current etcd v3.5.6 image which is in use. I'm not sure how the images are built. To work around it I am pulling the required image on a different system, then import it, so the upgrade can proceed. containerd version: v1.7.13 7c3aca7a610df76212171d200ca3811ff6096eb8 Just mentioning in case this would be a useful debugging scenario, otherwise I"ll just do the workaround on all our clusters and continue upgrading them. |
@rptaylor which version of nerdctl are you using (output of |
|
nerdctl v1.4.0 is quite old, and unsupported. Fixes for the problems discussed have been shipped with v2.0. Is it possible for you to update to nerdctl v2.0.3 and try again? |
Kernel is really old too (it's from 2018 right?). Is that an Oracle maintained kernel? Alma is the new CentOS, right? It should not be a problem (and definitely orthogonal to the issues here), but I wanted to point out that we do not have integration testing for that. |
No, this issue blocks the upgrade to a newer cluster version (including containerd, nerdctl versions, all integrated together in the Kubespray version) because it prevents getting the newer etcd image. After I do the workaround the upgrade can proceed to newer versions. This kernel is from October 2024, just a few months ago: https://almalinux.pkgs.org/8/almalinux-baseos-x86_64/kernel-4.18.0-553.27.1.el8_10.x86_64.rpm.html Anyway that's okay, thanks for looking into it! |
I see. Well, the likely silver-lining for you is that nerdctl v2 should have a patch for your issue with
Thanks for the info. Top of the head, you may have issues with recursive read-only bind mounts, but I guess for the most part it should be fine? Keep us posted if you get a chance once you have completed your upgrade. Cheers. |
Description
With changes introduced in finch#461, we will be defaulting to building images using the
type=image
format rather than thetype=docker
format. This change has exposed common test failures infinch image save
andfinch image load
in both finch and finch-core projects. This is effectively an issue that has presented itself withnerdctl image save
andnerdctl image load
. The issue occurs when an image (A) is built with no tags and withtype=image
, the image and builder cache are pruned, and we attempt to pull and save an image (B) which was (A)'s base layer. While the content for (A) has been effectively removed, the unpacked snapshot of (A)'s base layer remains. When we pull (B), only its manifest, config, and index are pulled. The actual content is not, resulting inSteps to reproduce the issue
note: these images are arm64. sha’s will vary on different platforms but the below process should work the same to repro.
nerdctl build --no-cache -f Dockerfile.with-build-arg --progress=plain --build-arg VERSION=3.13 .
In the output, see the following sha that represents the base layer of the image:
#5 sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc 2.72MB / 2.72MB 0.3s done
We also see the output:
We can see our image looks... weird, since we didn't tag it:
ctr
to inspect contentRemove the image you just built and inspect content again
nerdctl rmi 0f5d034dfcca # <- image id from step 1
Now, let’s do the same as 2. inspect content: All content is still there. This is because buildkit cached the content and the unmounted snapshot remains.
$ nerdctl system prune --all -f
This results in
and again follow inspect content:
Check out the remaining content:
note
sha256:de51348d43
- this is the sha of the unpacked layer for the alpine:3.13 image:we can still find that in
sudo ctr snapshot ls
:public.ecr.aws/docker/library/alpine:3.13
, try to save itYou can see that even though the actual base layer doesn’t exist in the content store (it is unpacked as a snapshot), on pull of the image, we don’t pull that layer back into the containerd content store. We only pull the index, manifest, and config. Why? because the snapshot for that layer,
sha256:de51348d43
, has already been unpacked and committed. For some reason, nerdctl/containerd thinks the layer still exists.Describe the results you received and expected
I expected either
nerdctl system prune --all -f
to remove the snapshotsha256:de51348d43
that was unpacked during buildOr
What version of nerdctl are you using?
v1.4.0
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
Finch/Lima, buildkit
Host information
Finch VM https://github.com/runfinch/finch
The text was updated successfully, but these errors were encountered: