-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify size of cloned / restored volumes #452
Clarify size of cloned / restored volumes #452
Conversation
a8e3db4
to
8ed11ad
Compare
spec.md
Outdated
@@ -725,7 +725,11 @@ Plugins MAY create 3 types of volumes: | |||
|
|||
- Empty volumes. When plugin supports `CREATE_DELETE_VOLUME` OPTIONAL capability. | |||
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities. | |||
SP SHOULD create a volume that looks like exact copy of the original snapshotted volume at the time the snapshot was taken. | |||
If the newly created volume is larger than the original volume and SP supports `EXPAND_VOLUME` node capability, CO SHOULD call `NodeExpandVolume` to make the extra space on the volume available to the workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to clarify that SP should return the size of the original volume? Also controller expand may also be called.
spec.md
Outdated
@@ -725,7 +725,11 @@ Plugins MAY create 3 types of volumes: | |||
|
|||
- Empty volumes. When plugin supports `CREATE_DELETE_VOLUME` OPTIONAL capability. | |||
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities. | |||
SP SHOULD create a volume that looks like exact copy of the original snapshotted volume at the time the snapshot was taken. | |||
If the newly created volume is larger than the original volume and SP supports `EXPAND_VOLUME` node capability, CO SHOULD call `NodeExpandVolume` to make the extra space on the volume available to the workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we say "original volume" or say - VolumeContentSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think usage of volumeContentSource
is correct but with an additional mention that snapshot or PVC
as data source, because may be in future we may have more data sources which dont support restore
or clone
operation ?
@jsafrane but, if CSI driver respond |
No, CO requested 2 GiB and SP MUST provide at least 2GiB. But it does not say anything about size of the actual filesystem on the volume. It may be (and IMO should) still be 1GiB. For filesystem volumes, containers will see only this 1 GiB filesystem. |
My opinion is that we should modify the wording to suggest that for new/empty volumes, the SP MUST ensure the filesystem is the correct size on creation, and for cloned/restored volumes the SP SHOULD correct the size on creation if it does not implement expand, but for SPs that do implement expand they MAY return the smaller size and leverage the CO's resize behavior to get the filesystem expanded. |
How will the CO know that a node expansion is still required if the SP returns the requested size? |
It wouldn't. The SP returning the incorrect size is the only way to signal that more work is needed. We just need to clarify that it's only valid to do that if the SP also supports expand (otherwise the CO can't take corrective action) and we should discourage doing that for new/empty volumes in the strongest language we can get away with. |
Are you suggesting that if SP does not implement expand, it will return a smaller size of volume? |
spec.md
Outdated
@@ -725,7 +725,11 @@ Plugins MAY create 3 types of volumes: | |||
|
|||
- Empty volumes. When plugin supports `CREATE_DELETE_VOLUME` OPTIONAL capability. | |||
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities. | |||
SP SHOULD create a volume that looks like exact copy of the original snapshotted volume at the time the snapshot was taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, CO requested 2 GiB and SP MUST provide at least 2GiB. But it does not say anything about size of the actual filesystem on the volume. It may be (and IMO should) still be 1GiB. For filesystem volumes, containers will see only this 1 GiB filesystem.
Isnt this conflict with the example above? We should clarify the content should be the exact copy but SP can return with a larger size of Volume, although only a portion of it might be used
I think we should keep the current behavior of SP. The behavior change seems need to be done on the CO side. CO should check if SP has expansion capability.
|
@jsafrane afacit, this is not specific or not only applicable to
IMO, we should make this clarified in terms of : how CO interpret the volume returned from SP is larger than the original volume?. I have an additional question: What about BLOCK volume modes and for volume types which dont require NodeExpansion? . All the operations are performed by the SP driver at createVolume has already performed, so above wording give a false impression saying CO will call |
ControllerExpand should never be necessary after creating a volume from a clone/snapshot, because any resizing that needs to be done on the controller can and should have been done as part of the CreateVolume call. The existing language is clear about that much at least.
For raw block volumes, there should not be node work to complete the expansion. The only case when nodes need to be involved in raw block volume expansion is when the volume is in use at the time of expansion, and that doesn't apply for newly-created volumes because they can't be in use at creation time. |
@bswartz I am/was not sure, all the storages support CLONE operation with a different size, Is that the case ? if thats how things are, we could discard that mention.
Sure, I am on same page, my request was that, above wording kind of conveying a message that ( atleast to me) CO will issue NodeExpand call, regardless of the volume mode or regardless of NodeExpansion required flag for the volume provisioned by the SP. |
My theory is that, whatever the controller was going to do in response to the ControllerExpand() call, it has all the information it needs to performs the same operations during a CreateVolume() call, so there's zero value in asking the CO to make a ControllerExpand() in addition to the CreateVolume(). The existing spec wording says this unambiguously. Personally I also think the existing spec wording implies that NodeExpandVolume() should also be unecessary, but I think it's vague enough that we can get away with changing it and that shouldn't break anyone. |
imo, above is not really the case. That said, as an example, the
👍 |
Yes the secrets could in theory be different, but CSI drivers get to define what secrets they need. If a driver needs an additional secret to expand that's not needed for create (hard to imagine how that would happen) then the CSI plugin just has to be deployed in such a way that both secrets are supplied to create. This is as easy as modifying the storage class. IMO the vast majority of drivers don't use secrets at all, or if they do, it's the same secret for all controller operations. |
Meeting Notes from CSI Meeting Oct-14-2020@bswartz - What happens if resize capability not implemented by a plugin? Silently succeeding in that case seems wrong. Should we say that the call should fail? Action item: try and implement in driver to see how painful it is. Come back to next meeting to discuss. |
Windows node plugins do have the ability to handle Slightly off-topic from the conversation above: it appears, GCE PD resizer does not handle Windows nodes (please correct me if I am incorrect). In such scenarios, does the CSI node plugin report
Control-plane is indeed Linux only. Porting and supporting the various controllers to Windows is not a near term goal - although as Michelle mentions, it should probably all just work. I think it could be a higher priority if an initiative spins up to run the master nodes on Windows but I don't think there is such a need in the near term. |
I implemented changes suggested in #452 (comment) in AWS EBS CSI driver: kubernetes-sigs/aws-ebs-csi-driver#595. Missing unit test and xfs support (only ext3/4 is implemented). In the end, it's comparing size of a device (easy to get via I noticed that some drivers vendor |
Discussed more in community meeting. See notes. Leaning towards having plugins implement this behavior but details TBD, will continue discussion at next meeting. |
8ed11ad
to
73d7751
Compare
Updated this PR. From the meeting notes "if driver accepts CreateVolume from volume or snapshot that was smaller, and you respond successfully, it better be the the right size by the time NodePublish completes" |
spec.md
Outdated
@@ -725,7 +725,13 @@ Plugins MAY create 3 types of volumes: | |||
|
|||
- Empty volumes. When plugin supports `CREATE_DELETE_VOLUME` OPTIONAL capability. | |||
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities. | |||
The Plugin SHOULD create a volume that looks like exact copy of the original snapshotted volume at the time the snapshot was taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If feels to me like we could de-duplicate some of the language here and below, by including the requirements about resizing the filesystem in a later paragraph and referring to that twice in the list.
Also, why are we hedging with SHOULD? It feels like MUST would be appropriate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with MUST and moved to a separate paragraph below.
@bswartz PTAL
I think this looks good. |
@@ -727,6 +727,9 @@ Plugins MAY create 3 types of volumes: | |||
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities. | |||
- From an existing volume. When plugin supports cloning, and reports the OPTIONAL capabilities `CREATE_DELETE_VOLUME` and `CLONE_VOLUME`. | |||
|
|||
If CO requests a volume to be created from existing snapshot or volume and the requested size of the volume is larger than the original snapshotted (or cloned volume), the Plugin can either refuse such a call with `OUT_OF_RANGE` error or MUST provide a volume that, when presented to a workload by `NodePublish` call, has both the requested (larger) size and contains data from the snapshot (or original volume). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the restored file system not supported for resize at this moment, should Plugin return the same error code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already anwered by @jdef below, yes, OUT_OF_RANGE is IMO appropriate in this case too.
out-of-range can depend on the current state of the system. so if the
current state doesn't allow for the snapshot due to capacity constraints,
but later space might be freed up, then that seems in line w/ intended use
for the error code.
or .. was there another dimension to the question (that I'm not
understanding)?
…On Mon, Feb 15, 2021 at 5:49 PM Xiang Li ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec.md
<#452 (comment)>
:
> @@ -727,6 +727,9 @@ Plugins MAY create 3 types of volumes:
- From an existing snapshot. When plugin supports `CREATE_DELETE_VOLUME` and `CREATE_DELETE_SNAPSHOT` OPTIONAL capabilities.
- From an existing volume. When plugin supports cloning, and reports the OPTIONAL capabilities `CREATE_DELETE_VOLUME` and `CLONE_VOLUME`.
+If CO requests a volume to be created from existing snapshot or volume and the requested size of the volume is larger than the original snapshotted (or cloned volume), the Plugin can either refuse such a call with `OUT_OF_RANGE` error or MUST provide a volume that, when presented to a workload by `NodePublish` call, has both the requested (larger) size and contains data from the snapshot (or original volume).
what if the restored file system not supported for resize at this moment,
should Plugin return the same error code?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#452 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR5KLFBWRZQDVIH5VLG6TDS7GQHXANCNFSM4RUKHMXQ>
.
--
James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
spec.md
Outdated
@@ -1161,7 +1164,7 @@ The CO MUST implement the specified error recovery behavior when it encounters t | |||
| Source does not exist | 5 NOT_FOUND | Indicates that the specified source does not exist. | Caller MUST verify that the `volume_content_source` is correct, the source is accessible, and has not been deleted before retrying with exponential back off. | | |||
| Volume already exists but is incompatible | 6 ALREADY_EXISTS | Indicates that a volume corresponding to the specified volume `name` already exists but is incompatible with the specified `capacity_range`, `volume_capabilities`, `parameters`, `accessibility_requirements` or `volume_content_source`. | Caller MUST fix the arguments or use a different `name` before retrying. | | |||
| Unable to provision in `accessible_topology` | 8 RESOURCE_EXHAUSTED | Indicates that although the `accessible_topology` field is valid, a new volume can not be provisioned with the specified topology constraints. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST ensure that whatever is preventing volumes from being provisioned in the specified location (e.g. quota issues) is addressed before retrying with exponential backoff. | | |||
| Unsupported `capacity_range` | 11 OUT_OF_RANGE | Indicates that the capacity range is not allowed by the Plugin, for example when trying to create a volume smaller than the source snapshot. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST fix the capacity range before retrying. | | |||
| Unsupported `capacity_range` | 11 OUT_OF_RANGE | Indicates that the capacity range is not allowed by the Plugin, for example when trying to create a volume smaller than the source snapshot or the Plugin does not support creating volumes larger than the source snapshot. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST fix the capacity range before retrying. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source snapshot -> source snapshot or source volume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
When cloning a volume or restoring a snapshot, the resulting volume should be exact copy of the original volume. This includes any filesystem metadata. Therefore, when the new volume is larger than the original one, the new volume may need to resize the filesystem (ie call resize2fs, xfs_growfs, ...) to make the bigger capacity available to workloads.
@@ -1161,7 +1164,7 @@ The CO MUST implement the specified error recovery behavior when it encounters t | |||
| Source does not exist | 5 NOT_FOUND | Indicates that the specified source does not exist. | Caller MUST verify that the `volume_content_source` is correct, the source is accessible, and has not been deleted before retrying with exponential back off. | | |||
| Volume already exists but is incompatible | 6 ALREADY_EXISTS | Indicates that a volume corresponding to the specified volume `name` already exists but is incompatible with the specified `capacity_range`, `volume_capabilities`, `parameters`, `accessibility_requirements` or `volume_content_source`. | Caller MUST fix the arguments or use a different `name` before retrying. | | |||
| Unable to provision in `accessible_topology` | 8 RESOURCE_EXHAUSTED | Indicates that although the `accessible_topology` field is valid, a new volume can not be provisioned with the specified topology constraints. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST ensure that whatever is preventing volumes from being provisioned in the specified location (e.g. quota issues) is addressed before retrying with exponential backoff. | | |||
| Unsupported `capacity_range` | 11 OUT_OF_RANGE | Indicates that the capacity range is not allowed by the Plugin, for example when trying to create a volume smaller than the source snapshot. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST fix the capacity range before retrying. | | |||
| Unsupported `capacity_range` | 11 OUT_OF_RANGE | Indicates that the capacity range is not allowed by the Plugin, for example when trying to create a volume smaller than the source snapshot or the Plugin does not support creating volumes larger than the source snapshot or source volume. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST fix the capacity range before retrying. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there's another instance of this: "smaller than the source snapshot"
Looks good to me. Just a nit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Add a few notes to
CreateVolume
about expected content of the volume after volume clone / snapshot restore.Explicitly, when cloning a volume or restoring a snapshot to a bigger volume, the new volume may need
NodeExpandVolume
call to resize the filesystem (resize2fs
,xfs_growfs
, ...) to make the bigger capacity available to workloads.Example: user clones 1GiB ext4 volume into a 2GiB volume. In CSI it looks like
CreateVolumeRequest.CapacityRange = 2GiB
. SP should clone the volume as it is, not changing the filesystem metadata. Depending on what CO wants, it may keep the new volume as it is (i.e. containers will see just 1GiB of usable space) or it may callNodeExpandVolume
to make the whole volume available to the user.