-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge CSI/Cluster Volumes code into Master #3022
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds the protocol buffer definitions for cluster volumes and CSI support. Signed-off-by: Drew Erny <[email protected]>
* Adds the protocol buffer definitions for cluster volumes and CSI support. * Add controlapi and store support for volumes * Add CSI library, and basic test rigging. Test rigging is necessary to ensure that vndr pulls in all of the correct imports to the correct locations. * Make a substantial number of vendoring updates, in order to accomodate a newer version of protobuf required by the CSI library. * Adds a CSIConfig object to the ClusterSpec, which allows a user to specify the available plugins and the location to connect to them. This may or may not be the final API for CSI plugins, but should be adequate for initial testing. Signed-off-by: Drew Erny <[email protected]>
Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <[email protected]>
Renames the github.com/docker/swarmkit/manager/volumes package to github.com/docker/swarmkit/manager/csi, which more accurately reflects what the purpose of that package is. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
In the initial API commit, I forgot to add the repeated VolumeAttachment field to the Task object. This commit fixes that oversight. Signed-off-by: Drew Erny <[email protected]>
Adds code to keep track of node ID mappings to the CSI manager and the csi Plugin interface. This will allow us to use the Plugin interface solely in terms of the swarmkit node ID. Signed-off-by: Drew Erny <[email protected]>
Adds code to the csi volume Manager, which determines if a volume is available on a given node or not. Signed-off-by: Drew Erny <[email protected]>
Adds the volumeSet object to the scheduler. This object keeps track of volumes available on the system. Signed-off-by: Drew Erny <[email protected]>
Adds ginkgo tests for the integration between volumes and the scheduler. Signed-off-by: Drew Erny <[email protected]>
Adds basic handling in the dispatcher of Volumes Signed-off-by: Drew Erny <[email protected]>
Publishing volumes is now a two-step process. First, the Scheduler updates the Volume object to PENDING_PUBLISH, which indicates that the volume should be published, but that the call hasn't verifiably succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume RPC, and updates the volume object again to PUBLISHED, indicating that the call has succeeded. This makes sense because the Scheduler has knowledge of when and why a volume is in use. This change includes fairly substantial breaking changes to the protocol buffers, but this is acceptable because this code has not yet been released. Signed-off-by: Drew Erny <[email protected]>
Modifies the dispatcher to avoid sending a VolumeAssignment until the VolumePublishStatus is PUBLISHED for the node in question. The worker node will need to be aware of and compatible with the fact that the VolumeAssignment may not be present on the Worker for some time after the Task is sent down. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
Updates the Dispatcher to handle a different Volume workflow. 1. Volumes are now assigned completely independently of Tasks. The Scheduler decides where Volumes belong, and though the Task depends on them, usually Volumes aren't even ready to be used when the Task is dispatched 2. Volumes are removed with an assignment removal action, but their dependencies (the volume secrets) are not removed at that time, because they may be needed to actually do the unpublish calls on the node. 3. Volume removals are always sent to the node when they happen, because the node might have Volumes published that it does not know about (for example, after a restart). Signed-off-by: Drew Erny <[email protected]>
Adds code to the Scheduler to manage the end-stage of a Volume's lifecycle on a Node. When the Scheduler runs, it checks to see if any Volumes are no longer in use on any nodes. If so, those Volumes have thier PublishStatus.State set to PENDING_NODE_UNPUBLISH, which will signal to the rest of Swarm that the Volume should be freed on the node. Signed-off-by: Drew Erny <[email protected]>
Removing volumes is a tricky proposition, because it is not sufficient to simply delete the volume in question. The correct removal steps must be followed to cleanly remove the volume. First, to remove a volume from a node, the manager must know affirmatively that it is no longer in use on that node. If a volume is sitll in use, then it cannot be unpublished on the controller side. To solve this problem, a repeated string field is added to NodeDescription, reporting all volumes active on that node. Second, to remove a volume from Swarm, or to update it, the volume must not be active and published anywhere. To facilitate this, volume availability states are added to the VolumeSpec. These states, analogous to NodeAvailability, control the usage of the volume. Signed-off-by: Drew Erny <[email protected]>
* Updates the Scheduler to not use volumes in the Pause or Drain availability * Creates a VolumeEnforcer, which is like the ConstraintEnforcer, except it rejects tasks belonging to Drained Volumes. * Updates the store to include a new filter for Tasks by VolumeAttachment, allowing an efficient way to locate all tasks using a given volume. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
Actually consists of two changes, which have gotten blended together by mistake. First, adds code for removing and deleting Volumes. Second, Adds code to retry failed Volume operations. This relies on a sort-of-priority queue, which allows us to schedule retries on a timer and then handle them as the backoff interval elapses for each operation. This is probably way over-engineered. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Signed-off-by: Ameya Gawde <[email protected]>
Does two things: 1. Splits the configuration for a plugin into 2 Sockets, one for the controller and one for the node. 2. Updates the agent to handle creating and propagating plugins Shovels off all responsibility for plugin management to the executor by way of a new method on that interface. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
* Refactors the volume queue to its own package, shared by the manager and agent. * Adds code to facilitate reporting when a volume is unpublished from the agent. Signed-off-by: Drew Erny <[email protected]>
Further refactors and updates the agent to support the volume removal workflow. This should complete the volume removal functionality. Signed-off-by: Drew Erny <[email protected]>
Still needs tests written Signed-off-by: Drew Erny <[email protected]>
Adds code to the csi plugin adapter to actually call the underlying CSI RPCs for ControllerPublishVolume, ControllerUnpublishVolume, and DeleteVolume. Signed-off-by: Drew Erny <[email protected]>
The CSI manager now reads out and checks all volumes on initialization, which occurs at start up or leadership change. This means that work interrupted by an outage or leadership change is picked up where it was left off. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
Alters the Client method on the csi plugin object to lazy-initialize the gRPC client when needed. Signed-off-by: Drew Erny <[email protected]>
Signed-off-by: Drew Erny <[email protected]>
Adds the volume access type (Mount or Block) to the VolumeSpec. This was the last missing piece needed to fully support CSI plugins at a minimal level. Before this, it was just hard coded to always use Mount-type volumes, which isn't how that's supposed to work. Signed-off-by: Drew Erny <[email protected]>
Converts the manager portion of the CSI code to use the Docker `PluginGetter` interface, instead of the CSIConfig object. Converts the agent to get its plugins from the PluginGetter, rather than getting them from the manager. Also removes the dispatcher sending updates about CSI node plugins. Removes various CSI plugin configuration fields from the API protocol buffers. Since we'll be using the plugingetter, these are no longer necessary. When publishing volumes, uses a propagated mount location, and resolves that location when getting the volume. Signed-off-by: Drew Erny <[email protected]>
Both the agent and manager of swarmkit's CSI components need a fake PluginGetter object for testing. This splits that common fake into its own object in the testutils package. Signed-off-by: Drew Erny <[email protected]>
Removing the CSIConfig object from the cluster requires changes to other tests. Signed-off-by: Drew Erny <[email protected]>
[feature-volumes] Use PluginGetter instead of Cluster CSIConfig
The signature for the `agent.NewDependencyManager` function has changed because of CSI support. Adds an empty FakePluginGetter to this constructor so that the tests compile. Signed-off-by: Drew Erny <[email protected]>
[feature-volumes] Fix template tests
V E R Y E X C I T I N G |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
All work for support of cluster volumes until now has been on the
feature-volumes
branch of swarmkit, in order to keep themaster
branch free of half-done volumes code.This PR merges
feature-volumes
intomaster
, the first step toward finally releasing cluster volume support.