Skip to content

Commit

Permalink
fix: properly reload persistent snapshotter data and restart services (
Browse files Browse the repository at this point in the history
…#767)

Issue #, if available: re-verified #412
- Through extensive e2e test debugging, I noticed that soci and stargz
snapshotters weren't persisting data as expected. After debugging, I
found some context in these two PRs:
  - awslabs/soci-snapshotter#881
  - containerd/stargz-snapshotter#1526
Unfortunately, neither of them are deployed yet, so I've implemented a
hacky workaround for now. After this change, an image/container can be
pull/run, the VM can be restarted, and then the container can be
re-started again.

*Description of changes:*
- Redo how BuildKit/Stargz/SOCI are related to containerd using
[systemd's `PartOf`

](https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#PartOf=)
- this ensures that all of these services are restarted when containerd
is restarted, which the lack of has caused errors in the past
- Create some missing directories that might throw errors in cloud-init
- Ensure that `SIGTERM` is used to kill the snapshotter services for now

*Testing done:*
- manual testing



- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: Justin Alvarez <[email protected]>
  • Loading branch information
pendo324 authored Jan 29, 2024
1 parent 673c2a5 commit 700cb92
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 10 deletions.
30 changes: 25 additions & 5 deletions finch.windows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ provision:
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L144-L146
# XDG_DATA_HOME & ~/.local/share: https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L51
mkdir ~/.local/share/containerd
mkdir -p ~/.local/share/containerd
sudo mount --bind /mnt/lima-finch/containerd ~/.local/share/containerd
# https://github.com/containerd/nerdctl/blob/main/docs/dir.md#dataroot
Expand All @@ -78,13 +78,33 @@ provision:
sudo mount --bind /mnt/lima-finch/cni-config ~/.config/cni
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L148-L150
sudo mkdir -p /mnt/lima-finch/cni
sudo mkdir -p /mnt/lima-finch/cni /var/lib/cni
sudo mount --bind /mnt/lima-finch/cni /var/lib/cni
mkdir -p ~/.local/share/cni
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
# https://github.com/containerd/stargz-snapshotter/blob/94b12086ace4119e86d2db0d6343d7c734b56671/cmd/containerd-stargz-grpc/main.go#L67C2-L67C2
sudo mkdir -p /mnt/lima-finch/containerd-stargz-grpc/snapshotter/snapshots
sudo mount --bind /mnt/lima-finch/containerd-stargz-grpc /var/lib/containerd-stargz-grpc
# https://github.com/awslabs/soci-snapshotter/blob/335515f746f50c964ed48159257e1aeba04805b6/cmd/soci-snapshotter-grpc/main.go#L84
sudo mkdir -p /mnt/lima-finch/soci-snapshotter-grpc/snapshotter/snapshots /var/lib/soci-snapshotter-grpc
sudo mount --bind /mnt/lima-finch/soci-snapshotter-grpc /var/lib/soci-snapshotter-grpc
# Make sure stargz and buildkit are restarted with containerd
sudo mkdir -p /usr/local/lib/systemd/system/buildkit.service.d/
printf '[Unit]\nPartOf=containerd.service\n' | sudo tee /usr/local/lib/systemd/system/buildkit.service.d/finch.conf
sudo mkdir -p /usr/local/lib/systemd/system/stargz-snapshotter.service.d/
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/stargz-snapshotter.service.d/finch.conf
# Add a new services that syncs the filesystem before shutdown
printf '[Unit]\nDescription=Sync containerd on shutdown\nDefaultDependencies=no\nBefore=shutdown.target reboot.target halt.target kexec.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sync /var/lib/containerd"\n\n[Install]\nWantedBy=halt.target reboot.target shutdown.target kexec.target\n' | sudo tee /usr/local/lib/systemd/system/finch-sync-on-shutdown.service
sudo systemctl enable --now finch-sync-on-shutdown.service
# Add a new service that cleans up lingering CNI networks on boot
printf '[Unit]\nDescription=Delete hanging data on boot\nDefaultDependencies=no\nBefore=basic.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sudo rm /var/lib/cni/networks/bridge/**; sudo rm /var/lib/cni/results/bridge-finch-*"\n\n[Install]\nWantedBy=basic.target\n' | sudo tee /usr/local/lib/systemd/system/finch-cleanup-on-boot.service
sudo systemctl enable --now finch-cleanup-on-boot.service
# Make sure buildkit is restarted with containerd, so it uses the correct UUID
sudo systemctl add-requires buildkit.service containerd.service
sudo systemctl restart containerd.service
env:
Expand Down
28 changes: 24 additions & 4 deletions finch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ provision:
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L144-L146
# XDG_DATA_HOME & ~/.local/share: https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L51
mkdir ~/.local/share/containerd
mkdir -p ~/.local/share/containerd
sudo mount --bind /mnt/lima-finch/containerd ~/.local/share/containerd
# https://github.com/containerd/nerdctl/blob/main/docs/dir.md#dataroot
Expand All @@ -185,13 +185,33 @@ provision:
sudo mount --bind /mnt/lima-finch/cni-config ~/.config/cni
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L148-L150
sudo mkdir -p /mnt/lima-finch/cni
sudo mkdir -p /mnt/lima-finch/cni /var/lib/cni
sudo mount --bind /mnt/lima-finch/cni /var/lib/cni
mkdir -p ~/.local/share/cni
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
# Make sure buildkit is restarted with containerd, so it uses the correct UUID
sudo systemctl add-requires buildkit.service containerd.service
# https://github.com/containerd/stargz-snapshotter/blob/94b12086ace4119e86d2db0d6343d7c734b56671/cmd/containerd-stargz-grpc/main.go#L67C2-L67C2
sudo mkdir -p /mnt/lima-finch/containerd-stargz-grpc/snapshotter/snapshots
sudo mount --bind /mnt/lima-finch/containerd-stargz-grpc /var/lib/containerd-stargz-grpc
# https://github.com/awslabs/soci-snapshotter/blob/335515f746f50c964ed48159257e1aeba04805b6/cmd/soci-snapshotter-grpc/main.go#L84
sudo mkdir -p /mnt/lima-finch/soci-snapshotter-grpc/snapshotter/snapshots /var/lib/soci-snapshotter-grpc
sudo mount --bind /mnt/lima-finch/soci-snapshotter-grpc /var/lib/soci-snapshotter-grpc
# Make sure stargz and buildkit are restarted with containerd
sudo mkdir -p /usr/local/lib/systemd/system/buildkit.service.d/
printf '[Unit]\nPartOf=containerd.service\n' | sudo tee /usr/local/lib/systemd/system/buildkit.service.d/finch.conf
sudo mkdir -p /usr/local/lib/systemd/system/stargz-snapshotter.service.d/
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/stargz-snapshotter.service.d/finch.conf
# Add a new services that syncs the filesystem before shutdown
printf '[Unit]\nDescription=Sync containerd on shutdown\nDefaultDependencies=no\nBefore=shutdown.target reboot.target halt.target kexec.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sync /var/lib/containerd"\n\n[Install]\nWantedBy=halt.target reboot.target shutdown.target kexec.target\n' | sudo tee /usr/local/lib/systemd/system/finch-sync-on-shutdown.service
sudo systemctl enable --now finch-sync-on-shutdown.service
# Add a new service that cleans up lingering CNI networks on boot
printf '[Unit]\nDescription=Delete hanging data on boot\nDefaultDependencies=no\nBefore=basic.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sudo rm /var/lib/cni/networks/bridge/**; sudo rm /var/lib/cni/results/bridge-finch-*"\n\n[Install]\nWantedBy=basic.target\n' | sudo tee /usr/local/lib/systemd/system/finch-cleanup-on-boot.service
sudo systemctl enable --now finch-cleanup-on-boot.service
sudo systemctl restart containerd.service
# Probe scripts to check readiness.
Expand Down
3 changes: 2 additions & 1 deletion pkg/config/lima_config_applier.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ if [ ! -f /usr/local/bin/soci ]; then
ln -s /usr/local/lib/systemd/system/soci-snapshotter.service /etc/systemd/system/multi-user.target.wants/
restorecon -v /usr/local/lib/systemd/system/soci-snapshotter.service
systemctl daemon-reload
sudo systemctl add-requires soci-snapshotter.service containerd.service
sudo mkdir -p /usr/local/lib/systemd/system/soci-snapshotter.service.d/
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/soci-snapshotter.service.d/finch.conf
systemctl enable --now soci-snapshotter
fi
Expand Down

0 comments on commit 700cb92

Please sign in to comment.