Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge CSI/Cluster Volumes code into Master #3022

Merged
merged 44 commits into from
Aug 3, 2021
Merged

Merge CSI/Cluster Volumes code into Master #3022

merged 44 commits into from
Aug 3, 2021

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Jul 27, 2021

All work for support of cluster volumes until now has been on the feature-volumes branch of swarmkit, in order to keep the master branch free of half-done volumes code.

This PR merges feature-volumes into master, the first step toward finally releasing cluster volume support.

dperny and others added 30 commits June 18, 2021 08:29
Adds the protocol buffer definitions for cluster volumes and CSI
support.

Signed-off-by: Drew Erny <[email protected]>
* Adds the protocol buffer definitions for cluster volumes and CSI
support.
* Add controlapi and store support for volumes
* Add CSI library, and basic test rigging. Test rigging is necessary to
ensure that vndr pulls in all of the correct imports to the correct
locations.
* Make a substantial number of vendoring updates, in order to accomodate
a newer version of protobuf required by the CSI library.
* Adds a CSIConfig object to the ClusterSpec, which allows a user to
specify the available plugins and the location to connect to them. This
may or may not be the final API for CSI plugins, but should be adequate
for initial testing.

Signed-off-by: Drew Erny <[email protected]>
Adds code for creating CSI volumes. This includes:

* The basic Plugin object, which manages the connection to the CSI
plugin
* The basic VolumeManager object, which manages plugins and responds to
store events

This also includes lots of tests and tests rigging, including fake CSI
clients.

Signed-off-by: Drew Erny <[email protected]>
Renames the github.com/docker/swarmkit/manager/volumes package to
github.com/docker/swarmkit/manager/csi, which more accurately reflects
what the purpose of that package is.

Signed-off-by: Drew Erny <[email protected]>
In the initial API commit, I forgot to add the repeated VolumeAttachment
field to the Task object. This commit fixes that oversight.

Signed-off-by: Drew Erny <[email protected]>
Adds code to keep track of node ID mappings to the CSI manager and the
csi Plugin interface. This will allow us to use the Plugin interface
solely in terms of the swarmkit node ID.

Signed-off-by: Drew Erny <[email protected]>
Adds code to the csi volume Manager, which determines if a volume is
available on a given node or not.

Signed-off-by: Drew Erny <[email protected]>
Adds the volumeSet object to the scheduler. This object keeps track of
volumes available on the system.

Signed-off-by: Drew Erny <[email protected]>
Adds ginkgo tests for the integration between volumes and the scheduler.

Signed-off-by: Drew Erny <[email protected]>
Adds basic handling in the dispatcher of Volumes

Signed-off-by: Drew Erny <[email protected]>
Publishing volumes is now a two-step process. First, the Scheduler
updates the Volume object to PENDING_PUBLISH, which indicates that the
volume should be published, but that the call hasn't verifiably
succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume
RPC, and updates the volume object again to PUBLISHED, indicating that
the call has succeeded.

This makes sense because the Scheduler has knowledge of when and why a
volume is in use.

This change includes fairly substantial breaking changes to the protocol
buffers, but this is acceptable because this code has not yet been
released.

Signed-off-by: Drew Erny <[email protected]>
Modifies the dispatcher to avoid sending a VolumeAssignment until the
VolumePublishStatus is PUBLISHED for the node in question. The worker
node will need to be aware of and compatible with the fact that the
VolumeAssignment may not be present on the Worker for some time after
the Task is sent down.

Signed-off-by: Drew Erny <[email protected]>
Updates the Dispatcher to handle a different Volume workflow.

1. Volumes are now assigned completely independently of Tasks. The
Scheduler decides where Volumes belong, and though the Task depends on
them, usually Volumes aren't even ready to be used when the Task is
dispatched
2. Volumes are removed with an assignment removal action, but their
dependencies (the volume secrets) are not removed at that time, because
they may be needed to actually do the unpublish calls on the node.
3. Volume removals are always sent to the node when they happen, because
the node might have Volumes published that it does not know about (for
example, after a restart).

Signed-off-by: Drew Erny <[email protected]>
Adds code to the Scheduler to manage the end-stage of a Volume's
lifecycle on a Node.

When the Scheduler runs, it checks to see if any Volumes are no longer
in use on any nodes. If so, those Volumes have thier PublishStatus.State
set to PENDING_NODE_UNPUBLISH, which will signal to the rest of Swarm
that the Volume should be freed on the node.

Signed-off-by: Drew Erny <[email protected]>
Removing volumes is a tricky proposition, because it is not sufficient
to simply delete the volume in question. The correct removal steps must
be followed to cleanly remove the volume.

First, to remove a volume from a node, the manager must know
affirmatively that it is no longer in use on that node. If a volume is
sitll in use, then it cannot be unpublished on the controller side. To
solve this problem, a repeated string field is added to NodeDescription,
reporting all volumes active on that node.

Second, to remove a volume from Swarm, or to update it, the volume must
not be active and published anywhere. To facilitate this, volume
availability states are added to the VolumeSpec. These states, analogous
to NodeAvailability, control the usage of the volume.

Signed-off-by: Drew Erny <[email protected]>
* Updates the Scheduler to not use volumes in the Pause or Drain
availability
* Creates a VolumeEnforcer, which is like the ConstraintEnforcer, except
it rejects tasks belonging to Drained Volumes.
* Updates the store to include a new filter for Tasks by
VolumeAttachment, allowing an efficient way to locate all tasks using a
given volume.

Signed-off-by: Drew Erny <[email protected]>
Actually consists of two changes, which have gotten blended together by
mistake.

First, adds code for removing and deleting Volumes.

Second, Adds code to retry failed Volume operations. This relies on a
sort-of-priority queue, which allows us to schedule retries on a timer
and then handle them as the backoff interval elapses for each operation.

This is probably way over-engineered.

Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Does two things:

1. Splits the configuration for a plugin into 2 Sockets, one for the
controller and one for the node.
2. Updates the agent to handle creating and propagating plugins Shovels
off all responsibility for plugin management to the executor by way of a
new method on that interface.

Signed-off-by: Drew Erny <[email protected]>
* Refactors the volume queue to its own package, shared by the manager
and agent.
* Adds code to facilitate reporting when a volume is unpublished from
the agent.

Signed-off-by: Drew Erny <[email protected]>
dperny added 14 commits June 18, 2021 08:34
Further refactors and updates the agent to support the volume removal
workflow. This should complete the volume removal functionality.

Signed-off-by: Drew Erny <[email protected]>
Still needs tests written

Signed-off-by: Drew Erny <[email protected]>
Adds code to the csi plugin adapter to actually call the underlying CSI
RPCs for ControllerPublishVolume, ControllerUnpublishVolume, and
DeleteVolume.

Signed-off-by: Drew Erny <[email protected]>
The CSI manager now reads out and checks all volumes on initialization,
which occurs at start up or leadership change. This means that work
interrupted by an outage or leadership change is picked up where it was
left off.

Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
Alters the Client method on the csi plugin object to lazy-initialize the
gRPC client when needed.

Signed-off-by: Drew Erny <[email protected]>
Adds the volume access type (Mount or Block) to the VolumeSpec. This was
the last missing piece needed to fully support CSI plugins at a minimal
level. Before this, it was just hard coded to always use Mount-type
volumes, which isn't how that's supposed to work.

Signed-off-by: Drew Erny <[email protected]>
Converts the manager portion of the CSI code to use the Docker
`PluginGetter` interface, instead of the CSIConfig object.

Converts the agent to get its plugins from the PluginGetter, rather than
getting them from the manager. Also removes the dispatcher sending
updates about CSI node plugins.

Removes various CSI plugin configuration fields from the API protocol
buffers. Since we'll be using the plugingetter, these are no longer
necessary.

When publishing volumes, uses a propagated mount location, and resolves
that location when getting the volume.

Signed-off-by: Drew Erny <[email protected]>
Both the agent and manager of swarmkit's CSI components need a fake
PluginGetter object for testing. This splits that common fake into its
own object in the testutils package.

Signed-off-by: Drew Erny <[email protected]>
Removing the CSIConfig object from the cluster requires changes to other
tests.

Signed-off-by: Drew Erny <[email protected]>
[feature-volumes] Use PluginGetter instead of Cluster CSIConfig
The signature for the `agent.NewDependencyManager` function has changed
because of CSI support. Adds an empty FakePluginGetter to this
constructor so that the tests compile.

Signed-off-by: Drew Erny <[email protected]>
@decentral1se
Copy link

V E R Y E X C I T I N G

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants