Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlord/devicestate,secboot: fix factory reset on older versions #14471

Open
wants to merge 2 commits into
base: fde-manager-features
Choose a base branch
from

Conversation

valentindavid
Copy link
Contributor

@valentindavid valentindavid commented Sep 5, 2024

Until now, we were just delete old keys on factory-reset instead of
renaming if cryptsetup was too old. But this is a bit too dangerous.
Instead we copy-delete them.

Second, we need to convert old keyslots to have a name so we can
remove them after.

@github-actions github-actions bot added the Run Nested -auto- Label automatically added in case nested tests need to be executed label Sep 5, 2024
@valentindavid valentindavid added the FDE Manager Pull requests that target FDE manager branch label Sep 10, 2024
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from e820f9a to 21ed272 Compare September 11, 2024 10:11
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 21ed272 to 4b3991d Compare September 20, 2024 08:41
@valentindavid valentindavid added the Run nested The PR also runs tests inluded in nested suite label Sep 20, 2024
@valentindavid valentindavid reopened this Sep 20, 2024
@pedronis pedronis force-pushed the fde-manager-features branch from fcc5e2e to 24018ee Compare October 18, 2024 11:27
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 4b3991d to f463a03 Compare October 18, 2024 11:53
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 17.77778% with 37 lines in your changes missing coverage. Please review.

Please upload report for BASE (fde-manager-features@ca5418e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
secboot/secboot_sb.go 13.04% 20 Missing ⚠️
overlord/devicestate/handlers_install.go 31.25% 7 Missing and 4 partials ⚠️
secboot/secboot_dummy.go 0.00% 5 Missing ⚠️
secboot/secboot_tpm.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                   @@
##             fde-manager-features   #14471   +/-   ##
=======================================================
  Coverage                        ?   77.67%           
=======================================================
  Files                           ?     1168           
  Lines                           ?   155564           
  Branches                        ?        0           
=======================================================
  Hits                            ?   120830           
  Misses                          ?    27239           
  Partials                        ?     7495           
Flag Coverage Δ
unittests 77.67% <17.77%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch 2 times, most recently from 3fe46ac to 29dd378 Compare November 19, 2024 11:38
@valentindavid valentindavid marked this pull request as ready for review November 19, 2024 11:38
@valentindavid
Copy link
Contributor Author

This might still need #14718 to work. We will see.

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch 4 times, most recently from c0efc73 to d43620a Compare November 20, 2024 09:30
@valentindavid
Copy link
Contributor Author

We are missing a piece, I have forgotten about that. We need to be able to convert LUKS2 disk without tokens to ones with tokens, just for the names, so we can rename and delete them.

@valentindavid
Copy link
Contributor Author

I have opened canonical/secboot#349 to have a way to write factory reset in a way that works.

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from d43620a to bfbb339 Compare December 10, 2024 10:46
@valentindavid
Copy link
Contributor Author

This also depends on canonical/secboot#355

@valentindavid
Copy link
Contributor Author

Draft, because it needs PRs to be merged on secboot.

@valentindavid valentindavid changed the title tests: verify that factory reset after remodeling to a new snapd works overlord/devicestate,secboot: fix factory reset on older versions Jan 24, 2025
@valentindavid valentindavid marked this pull request as ready for review January 24, 2025 14:56
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 288d176 to 7c2e31f Compare January 24, 2025 15:00
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions/comments

@@ -348,7 +348,7 @@ func AddBootstrapKeyOnExistingDisk(node string, newKey keys.EncryptionKey) error
// Rename key slots on LUKS2 container. If the key slot does not
// exist, it is ignored. If cryptsetup does not support renaming, then
// the key slots are instead removed.
func RenameOrDeleteKeys(node string, renames map[string]string) error {
func RenameKeys(nonAtomic *NonAtomicOperationAllowedFlag, node string, renames map[string]string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there calls to RenameKeys that will not pass nonAtomic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we have only one call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the nonAtomic bit here is probably overkill, we can just have a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will simplify it then.

Comment on lines 1391 to 1393
if err := secbootDeleteKeys(diskPath, toDelete); err != nil {
return fmt.Errorf("cannot delete previous keys: %w", err)
}
if err := secbootRemoveOldDiskKeys(diskPath); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it a bit strange that we both call DeleteKeys and then call a helper that does a bit of that as well, at a minimum these need comments about what each does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also difficult to explain the name of the function. One remove old keys if you reset from a new version. The other one is if you reset from an old version where snapd was updated in the seed with a remodel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments before the calls to explain what both are doing.

secboot/secboot_sb.go Show resolved Hide resolved
@@ -348,7 +348,7 @@ func AddBootstrapKeyOnExistingDisk(node string, newKey keys.EncryptionKey) error
// Rename key slots on LUKS2 container. If the key slot does not
// exist, it is ignored. If cryptsetup does not support renaming, then
// the key slots are instead removed.
func RenameOrDeleteKeys(node string, renames map[string]string) error {
func RenameKeys(nonAtomic *NonAtomicOperationAllowedFlag, node string, renames map[string]string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very weird API tbh. The call sites where this isn't passed will see a bare nil as the first argument which is even more confusing than bare true/false. I understand canonical/secboot does this, but it doesn't make it any less weird and it doesn't need to be exposed outside of the secboot package.

If this needs an argument then pass it in a struct:

type RenameKeysFlags struct {
    AllowNonAtomic bool
}

func RenameKeys(node string, renames map[string]string, flags RenameKeysFlags) error {
..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is on purpose weird because should avoid using this mode. Chris was concerned that would get used when we do not need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested by @pedronis this was simplified.

@pedronis pedronis force-pushed the fde-manager-features branch from 7005c64 to 25f9119 Compare January 29, 2025 10:49
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 91da819 to 1e6820c Compare January 29, 2025 10:57
Copy link

github-actions bot commented Jan 29, 2025

Tue Feb 4 00:34:40 UTC 2025
The following results are from: https://github.com/canonical/snapd/actions/runs/13119332245

Failures:

Preparing:

  • openstack:debian-12-64
  • openstack:debian-sid-64
  • google-nested:ubuntu-20.04-64:tests/nested/manual/remodel-uc20-to-uc22:encrypted
  • google-nested:ubuntu-22.04-64:tests/nested/manual/recovery-system-reboot:recover
  • google-nested:ubuntu-22.04-64:tests/nested/manual/recovery-system-reboot:factory_reset
  • google-nested:ubuntu-24.04-64:tests/nested/manual/recovery-system-reboot:recover
  • google-nested:ubuntu-24.04-64:tests/nested/manual/recovery-system-reboot:factory_reset
  • google-nested:ubuntu-24.04-64:tests/nested/manual/build-with-kernel-modules-components:encrypted
  • google-nested:ubuntu-24.04-64:tests/nested/manual/core20-fault-inject-on-install-component:kernel_panic_prepare_kernel_components
  • google-nested:ubuntu-24.04-64:tests/nested/manual/kernel-modules-components:encrypted
  • google-nested:ubuntu-24.04-64:tests/nested/manual/uc20-fde-hooks:tokens
  • openstack:opensuse-15.6-64:tests/main/snapctl

Executing:

  • openstack:debian-sid-64:tests/main/microk8s-smoke:edge
  • openstack:debian-12-64:tests/main/microk8s-smoke:edge
  • google-nested:ubuntu-20.04-64:tests/nested/manual/update-snapd-seed-and-factory-reset:tpm
  • google-nested:ubuntu-20.04-64:tests/nested/manual/remodel-offline:local_assertions
  • google-nested:ubuntu-20.04-64:tests/nested/manual/remodel-offline:installed_snaps
  • google-nested:ubuntu-20.04-64:tests/nested/manual/remodel-offline:local_and_installed_snaps
  • google-nested:ubuntu-24.04-64:tests/nested/manual/remodel-min-size
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh-retry
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh-gating-from-snap
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh:regular
  • openstack:opensuse-tumbleweed-64:tests/main/refresh-app-awareness-notify
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh:parallel
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh-pre-download:ignore
  • openstack:opensuse-tumbleweed-64:tests/main/snap-refresh-hold
  • openstack:opensuse-15.6-64:tests/main/microk8s-smoke:edge
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh-pre-download:close
  • openstack:opensuse-tumbleweed-64:tests/main/auto-refresh-gating
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-required-or-optional
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-helper
  • google:ubuntu-25.04-64:tests/main/cgroup-devices-v2
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-serial-port
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups:uinput
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-strict-enforced
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-self-manage
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups:kmsg
  • google:ubuntu-16.04-64:tests/unit/go:static

Restoring:

  • openstack:debian-12-64:tests/unit/
  • openstack:debian-12-64:tests/unit/
  • openstack:opensuse-tumbleweed-64:tests/main/refresh-app-awareness-notify
  • openstack:opensuse-15.6-64:tests/main/stale-base-snap
  • openstack:opensuse-15.6-64:tests/main/
  • openstack:opensuse-15.6-64
  • openstack:opensuse-15.6-64:tests/main/snapctl
  • openstack:opensuse-15.6-64:tests/main/
  • openstack:opensuse-15.6-64
  • openstack:opensuse-tumbleweed-64:tests/main/snap-session-agent-unavailable-to-snaps
  • openstack:opensuse-tumbleweed-64:tests/main/services-after-before-install
  • openstack:opensuse-tumbleweed-64:tests/main/
  • openstack:opensuse-tumbleweed-64
  • openstack:opensuse-tumbleweed-64:tests/main/
  • openstack:opensuse-tumbleweed-64
  • google:ubuntu-25.04-64:tests/main/security-device-cgroups-strict-enforced

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this looks good, a bunch of naming/commenting remarks.

Can you remind what happens with all the rename if the factory reset doesn't quite finish?

// ConvertOldDisk takes on disk using legacy keyslots 0, 1, 2 and add
// names to those keyslots. This is needed to convert the save disk
// during a factory reset.
func ConvertOldDisk(devicePath string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Convert/TemporaryNameOldKeys/ would probably describe better what this does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this should mention that it is for migrating during factory-reset and that it is a noop if not unamed legacy keyslots exists.

@@ -348,7 +348,8 @@ func AddBootstrapKeyOnExistingDisk(node string, newKey keys.EncryptionKey) error
// Rename key slots on LUKS2 container. If the key slot does not
// exist, it is ignored. If cryptsetup does not support renaming, then
// the key slots are instead removed.
func RenameOrDeleteKeys(node string, renames map[string]string) error {
// WARNING: this function is not atomic. Please avoid using it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not always atomic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is not atomic only if cryptsetup is too old. I will amend that.


// RemoveOldDiskKeys removes key slots from an old installation that
// had names created by ConvertOldDisk.
func RemoveOldDiskKeys(devicePath string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't DeleteOldKeys be more consistent with other naming here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not fail if the key don't exist right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the keys of an old disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeleteOldKeys is fine, I will change to it.

overlord/devicestate/handlers_install.go Show resolved Hide resolved
return nil, fmt.Errorf("cannot rename existing keys: %w", err)
}

if err := secbootConvertOldDisk(saveNode); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// deal as needed instead with naming unamed keyslots, they will be removed at the end of factory reset

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 5750e24 to b835a09 Compare January 29, 2025 15:37
@valentindavid
Copy link
Contributor Author

@pedronis

Can you remind what happens with all the rename if the factory reset doesn't quite finish?

For the preparation of the factory reset, if the preparation is not finished, the keys are already named or renamed.

If the factory reset clean up does not finish, it should run again. Because we have a flag that needs to be removed last.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 352 to 353
// old, it will try to copy and delete keys instead. Please avoid
// using this function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// old, it will try to copy and delete keys instead. Please avoid
// using this function.
// old, it will try to copy and delete keys instead. Avoid
// using this function in new code.

Comment on lines 464 to 466
// add names to those keyslots. This is needed to convert the save
// disk during a factory reset. This is a no-operation if no keyslot
// are unnamed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// add names to those keyslots. This is needed to convert the save
// disk during a factory reset. This is a no-operation if no keyslot
// are unnamed.
// adds names to those keyslots. This is needed to convert the save
// disk during a factory reset. This is a no-operation if keyslots
// are already named.


prepare: |
# Install what is needed before using the fake store
snap install jq remarshal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be needed anymore, nested.sh was updated to use gojq

Suggested change
snap install jq remarshal

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from b835a09 to 9a27837 Compare January 31, 2025 08:53
@chrisccoulson chrisccoulson self-requested a review January 31, 2025 11:47
Copy link
Contributor

@chrisccoulson chrisccoulson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly ok - I've left a few comments, including a suggestion for TemporaryNameOldKeys.

I know we're trying to deliver a MVP for this cycle, but it feels a bit like factory-reset and reprovision are getting tangled together quite a bit. I just want to keep in mind that as per FO166, reprovision should be able to operate independently of factory reset eventually, and it will be required to do so for the post-recovery repair functionality in any case.

return nil, err
// Temporarily rename keyslots across the factory reset to
// allow to create the new ones.
if err := secbootRenameKeys(saveNode, renames); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FO166, which describes reprovisioning for FDE, suggests renaming existing keyslots early on to reprovision-XX (with XX being a numeric identifier).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure though what "persist the mapping of temporary to old key slot names" means this case. I wonder if maybe we can add a FIXME here and change the names once we have this persisting of the mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could prefix with "reprovision-" but keep the name instead of using an id for now.

// disk during a factory reset. This is a no-operation if all keyslots
// are already named.
func TemporaryNameOldKeys(devicePath string) error {
if err := sb.NameLegacyLUKS2ContainerKey(devicePath, 0, "old-default-key"); err != nil && !errors.Is(err, sb.KeyslotAlreadyHasANameErr) {
Copy link
Contributor

@chrisccoulson chrisccoulson Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this function is making assumption that the container structure is unmodified from the original install, I think it could be improved a bit to make it safer.

If the error is secboot.KeyslotAlreadyHasANameErr, should you check that the keyslot has the expected name before ignoring the error? I think you can probably do that by listing all names with secboot.ListLUKS2ContainerUnlockKeyNames, and then creating a secboot.LUKSKeyDataReader from each name and calling the KeyslotID method to verify that no unexpected name points to each of the keyslots 0-2.

Or an alternative way might be to turn secboot.KeyslotAlreadyHasANameErr into a structure that contains the keyslot's existing name, or define the type of the error based on a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer doing the removal of other keys that are named out of this function. They are named, so it is part of normal factory reset.

@@ -837,7 +837,7 @@ func PCRHandleOfSealedKey(p string) (uint32, error) {
func tpmReleaseResourcesImpl(tpm *sb_tpm2.Connection, handle tpm2.Handle) error {
rc, err := tpm.CreateResourceContextFromTPM(handle)
if err != nil {
if _, ok := err.(tpm2.ResourceUnavailableError); ok {
if errors.Is(err, &tpm2.ResourceUnavailableError{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 9a27837 to 9b8bfaf Compare February 3, 2025 14:36
Until now, we were just delete old keys on factory-reset instead of
renaming if cryptsetup was too old. But this is a bit too dangerous.
Instead we copy-delete them.

Second, we need to convert old keyslots to have a name so we can
remove them after.
@valentindavid valentindavid force-pushed the valentindavid/remodel-then-factory-reset branch from 9b8bfaf to d784faa Compare February 3, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FDE Manager Pull requests that target FDE manager branch Run Nested -auto- Label automatically added in case nested tests need to be executed Run nested The PR also runs tests inluded in nested suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants