-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CreateVolume idempotent #744
Make CreateVolume idempotent #744
Conversation
Pull Request Test Coverage Report for Build 1604
💛 - Coveralls |
/assign @leakingtapan |
/assign @bertinatto @wongma7 |
pkg/driver/controller.go
Outdated
// check if a request is already in-flight because the CreateVolume API is not idempotent | ||
if ok := d.inFlight.Insert(req); !ok { | ||
msg := fmt.Sprintf("Create volume request for %s is already in progress", volName) | ||
return nil, status.Error(codes.AlreadyExists, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI spec says use "ABORTED" for an operation in progress: https://github.com/container-storage-interface/spec/blame/master/spec.md#L470
26113ee
to
8cf86b7
Compare
@@ -201,6 +204,13 @@ func (d *controllerService) CreateVolume(ctx context.Context, req *csi.CreateVol | |||
return newCreateVolumeResponse(disk), nil | |||
} | |||
|
|||
// check if a request is already in-flight because the CreateVolume API is not idempotent | |||
if ok := d.inFlight.Insert(req); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just use the req.Name as the key? I dont think it will ever matter in practice but spec says name is sufficient: https://github.com/container-storage-interface/spec/blob/master/spec.md#controller-service-rpc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with this approach is that this inFlight
field is shared amongst the entire controller. By limiting the key to just the name we may encounter issues once other operations start depending on this field (assuming they also use only the name as the key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, it would need to categorize operations within the inflight map ... lgtm in current state then.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrishenzie, wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to workaround the issue through inflight check, but I won't mark the original issue as resolved. Since the driver could still crash at the middle of processing the duplicate CreateVolume request, and the restarted driver won't have the original inFlight map as it is only held in memory if nothing changed too much recently. The real fix should be let EBS to take the client token during CreateVolume, which they admitted it's an easy change.
@leakingtapan Yes this would still be an issue should the driver crash. Does the real fix you described require an EBS API change that is not currently available? I just want to make sure we're taking the best approach given the current limitations. |
yep. And the change is to add a new field to the existing request struct which is totally doable. @wongma7 do you wanna chat with EBS again for their progress on this? As I talked to them earlier, this is ready in the backend, and it just needs to be surfaced up. @chrishenzie feel free to merge ur change in at the meantime. I just don’t want to close the original issue accidentally |
I don't think I can LGTM my own PR, would you be able to? I can re-open the issue if it closes or open a separate one that links to it. |
Instead of "Fixes X", you can use a different keyword (like Addresses) so that github doesn't automatically close the issue. |
Updated description to use "Addresses" instead of "Fixes". |
/lgtm |
@AndyXiangLi I think we need a similar issue for |
@gnufied We definitely don't want multiple CreateSnapshot call in flight for same volume, I will take a look and follow up with my team regarding this! Do we have a ticket created for this? |
Is this a bug fix or adding new feature?
Addresses #165, but the full fix requires an EBS API change.
What is this PR about? / Why do we need it?
This PR works around idempotency issues with the
CreateVolume
API. It does so by using theInFlight
struct to mark a request as "in progress". If an identicalCreateVolumeRequest
is received while another is in progress, we return an error to the caller.What testing is done?
Unit tests and smoke tests on a k8s cluster on AWS.
This issue can be repeated by the following command (similar to the one in the GitHub issue, however it uses
&
to background the first call):When running against master I observed:
This created two volumes with tags
Key:CSIVolumeName, Value:chrishenzie-test
(per the two volume IDs that are returned).When running against this branch I observed:
Which created only one volume.
Open questions
What error code should be returned?Aborted
based on the CSI spec