Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for raw (non-formatted) volume #2651

Merged
merged 23 commits into from
Jul 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
dee37d0
storage-operator/doc: add comment about `deviceName` refresh
slaperche-scality Jun 30, 2020
77f74e4
doc/design: move volume with other architecture docs
slaperche-scality Jun 2, 2020
1ab6dd5
storage-operator/types: add noFormat option
slaperche-scality Jun 9, 2020
c22b717
doc/operation: document the `noFormat` parameter
slaperche-scality Jun 29, 2020
265efed
doc/design: update volume design
slaperche-scality Jun 30, 2020
cc6792d
storage-operator/controller: set PV mode according to the new option
slaperche-scality Jun 9, 2020
59efb24
salt/volume: add a new module function to retrieve size & path
slaperche-scality Jun 17, 2020
e1ed064
storage-operator/controller: use device_info instead of disk.dump
slaperche-scality Jun 23, 2020
5ad86d0
salt: add pillar refresh to volumes.prepared
slaperche-scality Jun 23, 2020
045c010
storage-operator/controller: use a volatile cache for device info
slaperche-scality Jun 23, 2020
2d2f001
storage-operator/salt: revert the Result field
slaperche-scality Jun 23, 2020
73a28f1
salt/volume: make sure parted is installed
slaperche-scality Jul 2, 2020
e08e30e
salt/volume: add support for non-formatted sparseLoopDevice
slaperche-scality Jun 29, 2020
5e071c9
salt/volume: s/format/prepare/
slaperche-scality Jun 29, 2020
ad254f3
salt/volume: add support for non-formatted rawBlockDevice
slaperche-scality Jun 29, 2020
508c0e3
salt/volume: update device_info to handle noformat volume
slaperche-scality Jun 29, 2020
35d026c
test/volume: add test for noformat volume
slaperche-scality Jun 29, 2020
07dfa18
salt/volume: add retry logic to device name resolution
slaperche-scality Jul 2, 2020
57bc9fb
storage-operator: use corev1.PersistentVolumeMode instead of a boolean
slaperche-scality Jul 3, 2020
ca23e09
storage-operator/salt: use volume name for GTP label instead of UUID
slaperche-scality Jul 3, 2020
f0afb57
salt/volume: use sgdisk instead of parted
slaperche-scality Jul 3, 2020
b0eca3d
storage-operator: do not always update LastUpdateTime
slaperche-scality Jul 3, 2020
47ca858
salt/volume: factor out the LVM lookup logic
slaperche-scality Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/developer/architecture/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ Architecture Documents
requirements
solutions
ci
volume
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
Volume Management v1.0
======================
Volume Management
=================

* MetalK8s-Version: 2.4
* Replaces:
* Superseded-By:


Absract
-------
Abstract
--------

To be able to run stateful services (such as Prometheus, Zenko or Hyperdrive),
MetalK8s needs the ability to provide and manage persistent storage resources.
Expand All @@ -27,10 +22,6 @@ delete MetalK8s volumes.
Scope
-----

The scope of this first version of Volume Management will be minimalist but
still functionally useful.


Goals
^^^^^

Expand All @@ -43,6 +34,7 @@ Goals
* add support for volume deletion (one by one) in the Platform UI
* add support for volume listing/monitoring (show status, size, …) in the
Platform UI
* expose raw block device (unformated) as **Volume**
* document how to create a volume
* document how to create a **StorageClass** object
* automated tests on volume workflow (creation, deletion, …)
Expand All @@ -53,7 +45,6 @@ Non-Goals

* RAID support
* LVM support
* expose raw block device (unformated) as **Volume**
* use an **Admission Controller** for semantic validation
* auto-discovery of the disks
* batch provisioning from the Platform UI
Expand Down Expand Up @@ -125,9 +116,9 @@ system) through the Salt API. Authentication to the Salt API will be done
though a dedicated Salt account (with limited privileges) using credentials
from a dedicated cluster **Service Account**.

.. uml:: volume_v1.0-creation_seqdiag.uml
.. uml:: volume-creation_seqdiag.uml

.. uml:: volume_v1.0-deletion_seqdiag.uml
.. uml:: volume-deletion_seqdiag.uml


Implementation Details
Expand Down Expand Up @@ -157,6 +148,28 @@ Similarly, our **Volume** object will have the following states:
* **Terminating**: cleanup of the backing storage in progress (e.g.
an asynchronous Salt call is still running).

Persistent block device naming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to have a reliable automount through kubelet, we need to create the
underlying **PersistentVolume** using a persistent name for the backing storage
device. We use different strategies according to the **Volume** type:

* **sparseLoopDevice** and **rawBlockDevice** with a filesystem: during the
formatting, we set the filesystem UUID to the **Volume** UUID and use
``dev/disk/by-uuid/<volume-uuid>`` as device path.
* **sparseLoopDevice** without filesystem: we create a GUID Partition Table on
the sparse file and create a single partition encompassing the whole device,
setting the GUID of the partition to the **Volume** UUID. We can then use
``/dev/disk/by-partuuid/<volume-uuid>`` as device path.
* **rawBlockDevice** without filesystem:

* the **rawBlockDevice** is a disk (e.g. ``/dev/sdb``): we use the same
strategy as above.
* the **rawBlockDevice** is a partition (e.g. ``/dev/sdb1``): we change the
partition GUID using the **Volume** UUID and use
``/dev/disk/by-partuuid/<volume-uuid>`` as device path.
* The **rawBlockDevice** is a LVM volume: we use the existing LVM UUID and
use ``/dev/disk/by-id/dm-uuid-LVM-<lvm-uuid>`` as device path.

Operator Reconciliation Loop
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -185,7 +198,7 @@ Once pre-checks are done, there are four cases:
4. the backing **PersistentVolume** exists: the operator will check its status
to update the volume's status accordingly.

.. uml:: volume_v1.0-main_loop_flowchart.uml
.. uml:: volume-main_loop_flowchart.uml


.. _volume-deployment:
Expand All @@ -202,23 +215,31 @@ deployment hasn't started, thus the operator will set a finalizer on the
asynchronous Salt call (which gives a job ID) before rescheduling the request
to monitor the evolution of the job.

If the **Volume** object has a job ID, then the storage preparation is in
progress and the operator will monitor it until it's over.
If the Salt job ends with an error, the operator will move the volume into a
failed state.
If we do have a job ID, then something is in progress and we monitor it until
it's over.
If it has ended with an error, we move the volume into a failed state.

Otherwise (i.e. Salt job succeeded), the operator will proceed with the
**PersistentVolume creation** (which requires an extra Salt call, synchronous
this time, to get the volume size), taking care of putting a finalizer on the
**PersistentVolume** (so that its lifetime is tied to the **Volume**'s) and
set the **Volume** as the owner of the created **PersistentVolume**.
Otherwise we make another asynchronous Salt call to get information (size,
persistent path, …) on the backing storage device (the polling is done exactly
as described above).

If we successfully retrieved the storage device information, we proceed with
the **PersistentVolume** creation, taking care of putting a finalizer on the
**PersistentVolume** (so that its lifetime is tied to ours) and setting ourself
as the owner of the **PersistentVolume**.

Once the **PersistentVolume** is successfuly created, the operator will move
the **Volume** to the `Available` state and reschedule the request (the next
iteration will check the health of the **PersistentVolume** just created).

.. uml:: volume_v1.0-deploy_volume_flowchart.uml
.. uml:: volume-deploy_volume_flowchart.uml

Steady state
~~~~~~~~~~~~

Once the volume is deployed, we update, with a synchronous Salt call, the
`deviceName` status field at each reconciliation loop iteration. This field
contains the name of the underlying block device (as found under `/dev`).

.. _volume-finalization:

Expand All @@ -241,7 +262,7 @@ becomes unused (this is done by rescheduling). Once the backing
**PersistentVolume** becomes unused, the operator will reclaim its storage and
remove the finalizers to let the object be deleted.

.. uml:: volume_v1.0-finalize_volume_flowchart.uml
.. uml:: volume-finalize_volume_flowchart.uml


Volume Deletion Criteria
Expand Down Expand Up @@ -270,7 +291,7 @@ In the end, a **Volume** can be deleted in two cases:
- the backing **PersistentVolume** is not bound (**Available**, **Released** or
**Failed**)

.. uml:: volume_v1.0-deletion_decision_tree.uml
.. uml:: volume-deletion_decision_tree.uml


Documentation
Expand Down
6 changes: 0 additions & 6 deletions docs/developer/design/index.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/developer/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ Developer Guide
.. toctree::

architecture/index
design/index
building/index
running/index
development/index
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ This section describes how to create a **Volume** from the **CLI**.
spec:
nodeName: <node_name>
storageClassName: <storageclass_name>
mode: "Filesystem"
rawBlockDevice:
devicePath: <device_path>

Expand All @@ -32,6 +33,8 @@ This section describes how to create a **Volume** from the **CLI**.
- **name**: the name of your volume, must be unique
- **nodeName**: the name of the node where the volume will be located.
- **storageClassName**: the **StorageClass** to use
- **mode**: describes how the volume is intended to be consumed, either
Block or Filesystem (default to Filesystem if not specified).
- **devicePath**: path to the block device (for example, `/dev/sda1`).

#. Create the **Volume**
Expand Down
Loading