-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support raw block volume #2421
Comments
I don't have any strong opinion - the implementation proposed looks acceptable to me, especially since it's a feature supported by
On this point, I think we need to adjust the Volume design document to match the other "Architecture documents" we have in the repo. We found that having a set of "KEP-style" docs was too heavy for our needs, and simply editing the content of such docs seems enough for us (rather than creating new docs to invalidate/supersede the initial design). This format would need some standardization however, to make it simpler when external contributors want to suggest a new design or modifications to an existing one. |
Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Document the new possible value for fsType: none. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
If `fsType` is "none", we set the volume mode to Block. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
TODO support not sparse Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Hum, in the end I'm no longer convinced that putting it into the Without formatting, it seems impossible to have a constant device path for a If this parameter becomes So I think I'll move back this new settings inside the Volume object itself (it will be optional with a default value that will keep the current behavior (use formatting), thus it shouldn't break the compat) Another issue: if one enter Until now if wasn't an issue because we were using the Volume UUID as FS UUID and everything worked fine: the operator can compute the path from the Volume alone. I can add an extra Salt call during the Volume provisionning (using
Edit: Hum, after some thoughts I think I'll go for a hashmap (volume, storage uuid). If we're unlucky enough to crash/restart between the time we fetch the UUID and the creation of the PV, we only have to call If this works well for |
We could add more |
In the first implementation of Volume (which never got merged, because I rewrote it according to review's feedback), we had a lot of "computed" state inside the status and I think that's not something we want.
But you would need to refresh this info at some point, because after a reboot your sparse loop device may be |
In a In case of a raw device, we could enforce this device to have a related World Wide Name, i.e., an entry pointing to it in In the GPT case, we could even enforce the use of specific GPT partition types. Now, when using
|
Thanks for the feedback. Let's put the selection between File/Block at the About the reliable device path, IIUC instead of using
And we don't support MBR/BIOS partition. (and I like the idea for SparseLoopDevice 👍 ) Is that right? As to where to store device size/device path until we create the PV, I guess the volatile hashtable is a good enough solution, right? There isn't much point in persisting those in the CR (even if it's in the status) as they are only needed between the time we fetch them and the time we create the PV (at which point they become persisted in the PV). |
Yup.
Well, the input is not necessarily
We need to validate this with TS.
Yeah, having sparse support for block volumes is a must, for test and dev at least.
Indeed. One can't know at which exact point in time we look those up anyway. |
(As an aside: even when a whole disk is to be used, I think a GPT table should be created on it, with a single partition spanning block 2048 to the end of the disk, and using a partuuid, instead of using the raw disk) |
Yup indeed, we also have this indirection to handle.
Will wait for a confirmation then.
Yeah, we could do it that way. |
For me, GPT is a good solution for that. We should not use boot drive => no need to support MBR/BIOS partition and in some case (PoC) if we need to simulate multiple drive on one drive LVM+GPT will do the trick. |
WWNs are reliable (at least that's the idea). However, indeed, since not all cases can be covered by using WWNs, maybe using an alternative that covers all cases by construction is 'better'. |
So, to sum up the feedback we got (here and in the mail thread):
|
This options allows to skip the formatting of the device. It's optional and will be set to false by default (i.e., format the device) to preserve the backward compatibility. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
This will replace call to `disk.dump`. By returning both size and path, we can keep a single Salt call for unformatted volume (for which we will need to get a reliable path from Salt). Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Since we will now need more information than just the size, we replace the call to `disk.dump` by a call to our function `metalk8s_volumes.device_info`. For now, we simply plug the new call, we don't use information other than size yet. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
We now call `metalk8s_volumes.device_info` instead of `disk.dump`. Unlike the later, this new function relies on the pillar content and we have a problem here because calling a Salt module doesn't refresh the pillar. That being said, the state we call before (volumes.prepared) does automatially refresh the pillar because it works fine and it does look up the volume from the pillar. But it seems this pillar isn't kept around, since calling the module right after fails. If we explicitely refresh the pillar during our state, then it's kept around and our module can reuse the fetched pillar: that's why we add a refresh_pillar call to our state. Now, in the unlikely case that we have a crash/restart after the state execution BUT before executing the module, we won't have access to a cached pillar: in such case we have a last-chance try/except that refresh the pillar from inside the module. I'm not super happy with the current approach…, but I don't see a better one: - using a state is not possible (because I want to return data) - calling `saltutil.refresh_pillar` from the operator will complicate the code a bit and won't even solve the "I have a crash/reboot/restart just before calling my module and thus no cached pillar!" situation. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Instead of persisting device information into the status (which is hackish), we now store them in a hashmap. It will be good enough for most of the cases, and in the worst case (we crash after getting the device info but before creating the PV), we can simply reschedule the Salt call to fetch the device information. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
We no longer use it (since we're using a volatile cache in the operator). We still keep some compatibility with this format, in case we have volume with serialized Result. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
This options allows to skip the formatting of the device. It's optional and will be set to false by default (i.e., format the device) to preserve the backward compatibility. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
We have to handle different cases here, the device path can points to: - a LVM volume: we have nothing to do (we'll use LVM UUID) - a disk partition (we make sure the partition's label is the volume UUID) - an entire disk: we create a GPT table and a single partition with the right label. Since the last case is the same as SparseLoopDeviceNoFormat, we factor out the common code into `prepare_noformat`. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
It seems that, in some cases, you may have a small delay where the partition is not visible under `/dev` after some operation (like changing the label): [root@test-master-1 centos]# ls /dev | grep vdb vdb vdb1 [root@test-master-1 centos]# parted /dev/vdb name 1 foo && ls /dev/ | grep vdb vdb [root@test-master-1 centos]# ls /dev | grep vdb vdb vdb1 Because of that, device name resolution may suffer transient failure. So let's retry a few times before giving up. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
This makes Volume closer to the underlying PersistentVolume (which simplify the code and also helps the user already familiar with PersistentVolume). Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Will be less confusing. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Will be less confusing. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
This makes Volume closer to the underlying PersistentVolume (which simplify the code and also helps the user already familiar with PersistentVolume). Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Will be less confusing. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
By using sgdisk we can change the partition GUID to match the volume UUID. Haven't found how to do it with parted… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
By using sgdisk we can change the partition GUID to match the volume UUID. Haven't found how to do it with parted… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
By using sgdisk we can change the partition GUID to match the volume UUID. Haven't found how to do it with parted… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
This makes Volume closer to the underlying PersistentVolume (which simplify the code and also helps the user already familiar with PersistentVolume). Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Will be less confusing. Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
By using sgdisk we can change the partition GUID to match the volume UUID. Haven't found how to do it with parted… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In some case, this trigger an infinite reconciliation loop… Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
Refs: #2421 Signed-off-by: Sylvain Laperche <[email protected]>
In #2421 `format` function of `metalk8s_volumes` salt module get renamed to `prepare` so this commit reflect this change in the salt unit tests for this module
In #2421 `format` function of `metalk8s_volumes` salt module get renamed to `prepare` so this commit reflect this change in the salt unit tests for this module
In #2421 `format` function of `metalk8s_volumes` salt module get renamed to `prepare` so this commit reflect this change in the salt unit tests for this module
Component:
operator, salt
Why this is needed:
There are some specialized applications (e.g.: database, SDS system, …) that require direct access to a block device because, for example, the file system layer introduces unneeded overhead.
To address this use case, we should be able to manage Volume that will backed by
PersistentVolume
withvolumeMode
set toBlock
.What should be done:
Currently, the
volumeMode
is hardcoded toFilesystem
in the operator, this need to change.In the same vein, the formatting step is always executed by the Salt state, this also need to be changed.
Implementation proposal:
We need a way to know if we want to format the volume or not.
This could be either at the
Volume
level or at theStorageClass
level.I suggest that we control that at the
StorageClass
level, because:Volume
level, we will have to sanity check against illogical combination (aStorageClass
withext4
on a non-formattedVolume
)StorageClass
level it could apply to bothSparseLoopDevice
andRawBlockDevice
.The idea is to accept a special value "none" for the parameter
fsType
in theStorageClass
.When
fsType
is set to "none" then:PersistentVolume
are created withvolumeMode
set toBlock
We could have made the field optional instead of using a special value, but I think that it's better to be explicit here.
Test plan:
If we can add support for
SparseLoopDevice
(may be tricky because we can't store the ID in the FS in that case), then we can add tests withSparseLoopDevice
: create a volume and check that the PV has the right mode and the backing storage is not formatted.Otherwise manuel testing I guess (since CI doesn't attach "real" volume to the VM).
The text was updated successfully, but these errors were encountered: