Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udiskslinuxblock: check recursively for CryptoBackingDevice #1339

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GovanifY
Copy link

@GovanifY GovanifY commented Jan 12, 2025

Currently, udisks only checks if the current disk is a LUK2 volume to detect its CryptoBackingDevice.
This leads to configurations where, when you use a LVM, some drives are mistakenly reported as unencrypted/unknown.

For example, see here:

└─nvme0n1p2       259:2    0  3,6T  0 part  
  └─matrix        254:0    0  3,6T  0 crypt 
    ├─matrix-swap 254:1    0   16G  0 lvm   [SWAP]
    └─matrix-root 254:2    0  3,6T  0 lvm   /nix/store

where neither matrix-swap nor matrix-root are being detected as encrypted.
This PR instead checks recursively for slaves in sysfs in the case of a device mapper device, until it either finds a root crypto device or quits.

As fwupd relies on udisks to check whether the swap is encrypted in its Host Security ID specification, this would also fix fwupd mistakenly reporting the swap as unencrypted.

This fixes #916 and fwupd/fwupd#4969 .

@GovanifY GovanifY force-pushed the govanify/recursive-crypto branch from 8c20a82 to 88945a6 Compare January 12, 2025 11:11
@GovanifY GovanifY force-pushed the govanify/recursive-crypto branch from 88945a6 to 4376a26 Compare January 12, 2025 11:21
Copy link
Member

@tbzatek tbzatek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense, thanks for your contribution!

I would still like to have @mvollmer a quick look to be 100% sure this wouldn't break anything in Cockpit.

@tbzatek
Copy link
Member

tbzatek commented Jan 13, 2025

Jenkins, ok to test.

@mvollmer
Copy link
Contributor

mvollmer commented Jan 14, 2025

Uh, this will need changes in Cockpit. We assume that only the device mapper device created by cryptsetup has CryptoBackingDevice set, and that CryptoBackingDevice always points at a block device that has a Encryption interface.

I didn't read the proposed change carefully, but I guess it will break these assumptions.

My initial reaction is that it's not a good idea to set CryptoBackingDevice recursively like this. It's a pretty big change to the API, and even violates the documentation, I would argue:

     The #org.freedesktop.UDisks2.Block object that is
     backing the device or <literal>/</literal> if unknown or if
     the block device is not the cleartext device for an encrypted
     device.

So CryptoBackingDevice "must" be / for a block device that is not the cleartext device for an encrypted device, i.e., the dm-crypt device mapper device setup by cryptsetup. In the example in the description, "matrix" is the cleartext device for the backing device nvme0n1p2 but neither "matrix-swap" or "matrix-root" are (they are likely dm-linear device mapper devices created by lvm2).

Could you introduce a new IndirectCryptoBackingDevice property and change fwupd to check that? Or more general properties that exposes the master / slave things from sysfs and fwupd walks that?

@mvollmer
Copy link
Contributor

Oh, the Cockpit tests are actually green! :-) I will do some manual testing to see if I can provoke some real breakage.

@GovanifY
Copy link
Author

GovanifY commented Jan 14, 2025

Hey,

Thanks for checking on cockpit!

Regarding the documentation, I actually understood it myself as an absolute backing device, which seems to be a shared understanding with other projects (fwupd, tails according to GitHub search, probably others). While lvm2 partitions like matrix-swap are not aware of being an encrypted device, they still very much are backed by a dm-crypt partition at the end of the day, although I'm obviously biased as the author of this PR :p

I'm not against adding an Indirect and/or slave field to the API FWIW, but I'd rather fix the API at its root if we can.

Continuing on the doc, I'd also like to note that the current CryptoBackingDevice implementation is actually wrong in a more severe way, as it does not detect non luks/bitlk/tcrypt encrypted partitions.
Plain encryption, for example, is quite popular for swap as it allows for a random key to be picked on boot.
I plan to add support for the remaining partition types in a followup PR as to lessen the review workload on this one.

@mvollmer
Copy link
Contributor

mvollmer commented Jan 14, 2025

I will do some manual testing to see if I can provoke some real breakage.

Yes, there is some. For example:

  • When unmounting a filesystem, Cockpit will also lock the encryption layer at the same time. With the change in this PR, Cockpit would try to lock "matrix" and fail (because "matrix" is kept busy by being used as a physical volume).
  • Cockpit will also try to propagate options like "read-only" from fstab to crypttab, and will get this wrong with the changes in this PR.
  • Cockpit hides the name of the dm-crypt device (which is often something like /dev/mapper/luks-1aa4cece-f70e-4acf-a870-1ca2ea98aa56) and uses the name of the backing device instead. With this change, it would call "matrix-root" "nvme0n1p2" in certain contexts.
  • etc

This can all be fixed pretty easily, so if you want to go ahead with the change, let's do it but please give us the opportunity to adapt.

@mvollmer
Copy link
Contributor

mvollmer commented Jan 14, 2025

Regarding the documentation, I actually understood it myself as an absolute backing device,

I think it comes down to what "the cleartext device" means. In my view of things, there is only one cleartext device per crypto device, and its the device mapper device created by cryptsetup. Any other interpretation seems unhelpful.

Continuing on the doc, I'd also like to note that the current CryptoBackingDevice implementation is actually wrong in a more severe way, as it does not detect non luks/bitlk/tcrypt encrypted partitions.

I think this is independent of the definition of CryptoBackingDevice. This is about the o.fd.UDisks2.Encrypted interface, which is not added to all the block devices that you wish.

Anyway, Cockpit assumes that obj.Block-CryptoBackingDevice.Encrypted-ClearTextDevice == obj (or obj.Block-CryptoBackingDevice == "/", of course) and I argue that the documentation supports that by using the words "the cleartext device of an encrypted device", and not "a device directly or indirectly backed by an encrypted device".

But, Cockpit can simply check whether the assumption is true and ignore CryptoBackingDevice for objects where it fails. Luckily we have the Encrypted-ClearTextDevice property, for some reason I thought we didn't. :-)

@mvollmer
Copy link
Contributor

Luckily we have the Encrypted-ClearTextDevice property, for some reason I thought we didn't. :-)

Aha, it was added only very recently, that's why I didn't know about it! Only six years ago! :-) Man, I am doing this for too long...

@tbzatek
Copy link
Member

tbzatek commented Jan 22, 2025

This fixes #916 and fwupd/fwupd#4969 .

FYI, #916 is about different issue and it might be a mistake actually if this pull request covers the problem there.

Anyway, I have given this change a second thought, based on Marius' input (thanks!) and by talking to other people. Something just didn't feel right to me, even though this change looked rather innocent. I believe we should be thoroughly consistent in the API and all its details.

Method calls or properties that take recursiveness in account should treated with care and consistency. Storage layering can be complex and unpredictable, more so if certain operation moves a nested device tree to a different backing device, and suddenly certain object properties down in the device tree change unpredictably.

For illustration (however impractical), if you move the LVM PV in question to a new multidisk array that has only one leg encrypted, what should the CryptoBackingDevice property report in this case? These are the questions that stem from 1:N mapping when crawling up the device tree. Consider that there might actually be more CryptoBackingDevice-s involved. As long as we now have support for LVM RAID, that encryption may not only be a LUKS device but a different devicemapper device sandwiched somewhere in LVM (or Stratis?), makes me feel that the CryptoBackingDevice property is lacking somewhat with the modern world. It was clearly designed for 1:1 Encrypted-Cleartext pairs having only LUKS in mind. This should be noted in the API docs at the very least.

Luckily we have the Encrypted-ClearTextDevice property, for some reason I thought we didn't. :-)

Aha, it was added only very recently, that's why I didn't know about it! Only six years ago! :-) Man, I am doing this for too long...

Hmmm, #551. It looks sane in the context.

So I believe we should not change this and should conserve the API at its current state, only clarifying the documentation that this is direct-descendant and LUKS only. There are existing consumers that we don't want to break.


Back to the beginning to the recursiveness thought. This is a recurring theme and we could perhaps either:

  • introduce new properties for whatever indirect or recursive
  • certain method calls may get a new extra option recursive
  • introduce a new device tree walker API, similar to the teardown functionality that is opaque (and doesn't fully work across UDisks modules at the moment)

... all of this as a general rule, applicable to more features than just an encryption.

@GovanifY
Copy link
Author

GovanifY commented Jan 22, 2025

For illustration (however impractical), if you move the LVM PV in question to a new multidisk array that has only one leg encrypted, what should the CryptoBackingDevice property report in this case? These are the questions that stem from 1:N mapping when crawling up the device tree. Consider that there might actually be more CryptoBackingDevice-s involved. As long as we now have support for LVM RAID, that encryption may not only be a LUKS device but a different devicemapper device sandwiched somewhere in LVM (or Stratis?), makes me feel that the CryptoBackingDevice property is lacking somewhat with the modern world. It was clearly designed for 1:1 Encrypted-Cleartext pairs having only LUKS in mind. This should be noted in the API docs at the very least.

I agree that the functionality is lacking and, if recursive, should allow for multiple items. Although that's a lot of work. How do you handle multi devices partition like btrfs or zfs that have multiple backing devices? You'd need to be aware of the partitioning semantics.

So I believe we should not change this and should conserve the API at its current state, only clarifying the documentation that this is direct-descendant and LUKS only. There are existing consumers that we don't want to break.

This API is not LUKS only. It currently also applies to TrueCrypt devices and BitLocker, but not plain encryption. It's rather inconsistent. Supporting the other partition types would be as simple as adding CRYPT-PLAIN and CRYPT-FVAULT2 in the UUID checks too. Clarifying the documentation about a footgun still doesn't erase the footgun!

Back to the beginning to the recursiveness thought. This is a recurring theme and we could perhaps either:

No strong thought on either of those solutions myself. As I said above, if we want to keep the API high-level we should have some understanding of the partitioning if we want an exhaustive crypto item, so that's probably a no go given the complexity, but at the same time if we only expose a tree of devices recursively, what do we bring more than sysfs?

Unsure what's the correct approach there. I'll probably let you decide as you're the domain expert heh.

Thanks for the exhaustive comment btw, I appreciate it :)

@mvollmer
Copy link
Contributor

I have given this change a second thought,

Thanks! Second thoughts are the best thoughts! :-)

@tbzatek
Copy link
Member

tbzatek commented Jan 27, 2025

I agree that the functionality is lacking and, if recursive, should allow for multiple items. Although that's a lot of work. How do you handle multi devices partition like btrfs or zfs that have multiple backing devices? You'd need to be aware of the partitioning semantics.

We don't :-) At least not yet. It's a lot of work indeed and it would require multiple iterations to make the object model right. A first sketch lives in #1232.

This API is not LUKS only. It currently also applies to TrueCrypt devices and BitLocker, but not plain encryption. It's rather inconsistent. Supporting the other partition types would be as simple as adding CRYPT-PLAIN and CRYPT-FVAULT2 in the UUID checks too. Clarifying the documentation about a footgun still doesn't erase the footgun!

Well this is already limited in the code, this will only highlight the fact in the docs. Support for the plain encryption should be added in #916.

So my new proposal is this:

  • document the existing org.freedesktop.UDisks2.Block.CryptoBackingDevice property limitations - see above - both the types and the direct-level aspects
  • add a note in udisks_linux_block_update() saying that the dm_uuid prefix match needs to be kept consistent with the API docs
  • introduce new method call org.freedesktop.UDisks2.Block.FindCryptoBackingDevices() (or similar naming, e.g. Lookup...?) that would return array of object paths that are backing this block device recursively
    • a method call because this is potentially expensive operation and -
    • the method call can be internally wired for modules round-trip for extended functionality
  • optionally, if you're interfacing UDisks through libudisks2, a convenient helper function returning array of UDisksBlockObject's can be added as well

Sounds reasonable?

At this point we're talking about dm-crypt. Separate discussion should be held about how native filesystem encryption should be presented in the UDisks object model. The new FindCryptoBackingDevice() method call has potential to suit both cases, being generally available on the org.freedesktop.UDisks2.Block interface,

@mvollmer
Copy link
Contributor

* introduce new method call `org.freedesktop.UDisks2.Block.FindCryptoBackingDevice()` (or similar naming, e.g. `Lookup...`?) that would return array of object paths that are backing this block device recursively

What about making this more generic and let it find all backing devices? I.e. FindBackingDevices(). Checking whether one of them is encrypted should be easy, and it also allows the caller to do the right thing (whatever that is) when only some of the backing devices are encrypted and others are not.

Talking about this, how precise would this be? For example, if I have a LVM2 volume group with a couple of PVs, and a LV that only uses one of the PVs, would FindBackingDevice return only that one PV (because no other devices store any bits of the LV and also that's what the kernel will tell us when looking at the device mapper tables, I guess), or would it return all PVs of the volume group? Both might be useful, but I guess the former is better since it is more precise and can not be easily computed from existing UDisks2 properties.

@tbzatek
Copy link
Member

tbzatek commented Jan 28, 2025

What about making this more generic and let it find all backing devices? I.e. FindBackingDevices().

Hmmm, good point. Specifically for the fwupd case, if I understood correctly, the intention was about reporting that the user is safe and has all data encrypted. So it needs to know that all backing devices are encrypted and report warning if one of them is not.

So this method call would effectively act as a device tree walker, reporting all devices from down-to-top and also to the width across multiple components of a particular block device layer? (We need to be careful with circular dependencies, even across several layers through a middle man).

Talking about this, how precise would this be? For example, if I have a LVM2 volume group with a couple of PVs, and a LV that only uses one of the PVs, would FindBackingDevice return only that one PV (because no other devices store any bits of the LV and also that's what the kernel will tell us when looking at the device mapper tables, I guess), or would it return all PVs of the volume group? Both might be useful, but I guess the former is better since it is more precise and can not be easily computed from existing UDisks2 properties.

I think it should be a "best effort" as I believe it's not always possible or easy to get an authoritative reply. The DM tables are authoritative enough, let's just not complicate things. Given how wide the device tree could be, I think it's not worth recursing into the nodes that are clearly unused at a given moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encrypted swap is not recognized
4 participants