-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopting Multiple CSI Driver improvements #309
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jacobwolfaws The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Seems like there's a transient error with one of the sanity tests |
pkg/driver/internal/inflight.go
Outdated
|
||
const ( | ||
VolumeOperationAlreadyExistsErrorMsg = "An operation with the given volume %s already exists" | ||
SnapshotOperationAlreadyExistsErrorMsg = "An operation with the given snapshot %s already exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FSx for Lustre CSI driver does not support snapshots yet, so this constant is unused. I think it would be best practice to remove this for now, especially because snapshots/backups may not be implemented for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed
flag "github.com/spf13/pflag" | ||
) | ||
|
||
func TestNodeOptions(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd that you're adding this test for node options before node options is implemented. I'm assuming as-is these unit tests don't pass. Ideally this would be fixed, but doing so likely isn't the hassle given how these commits are stacked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just an empty test, so I believe it passes. Either way I agree, I don't think it makes sense to change this given how the commits are stacked
flag "github.com/spf13/pflag" | ||
) | ||
|
||
func TestServerOptions(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, unless I'm missing something I don't think these tests would pass. I may be mistaken though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response :)
pkg/driver/driver.go
Outdated
} | ||
|
||
driver.controllerService = newControllerService(&driverOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we set these here instead of in the driver declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done like this to accommodate the different modes, the alternative would be to return a driver configured differently for each case. This is the format EBS uses + the one I prefer to that option
return nil, fmt.Errorf("node providerID empty, cannot parse") | ||
} | ||
|
||
awsInstanceIDRegex := "s\\.i-[a-z0-9]+|i-[a-z0-9]+$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This regex retrieves the ec2 instance id of the node from the provider id, right? If so, I'm partial to specifying ec2InstanceId in the names of these variables. Do you have context into what would be required to provide support for the CSI driver in non-EKS Kubernetes environments?
@@ -12,13 +12,13 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
VERSION=v0.9.0 | |||
VERSION?=v0.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is there more context behind this change? Is there any world in which VERSION would be already set before making this declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just makes sense to move us into alignment with the EBS CSI Driver. I don't think it makes a difference for us, but for consistency's sake we should change it
@@ -34,10 +34,15 @@ spec: | |||
image: public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.9.0 | |||
imagePullPolicy: IfNotPresent | |||
args: | |||
# - --mode={all,controller,node} # specify the driver mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same idea as above, for my own understanding what is the default behavior when mode isn't specified? From what I can tell the switch statements are configured to throw an error. Why don't we specify the mode as controller here?
@@ -6,3 +6,7 @@ | |||
*.swp | |||
bin/ | |||
*~ | |||
|
|||
# Vendor dir | |||
vendor/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smart, we should do this for the OpenZFS CSI driver too
c70efba
to
9a13bb1
Compare
@@ -71,10 +117,17 @@ func (d *Driver) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest) | |||
return nil, status.Error(codes.InvalidArgument, "Volume capabilities not provided") | |||
} | |||
|
|||
if !d.isValidVolumeCapabilities(volCaps) { | |||
if !isValidVolumeCapabilities(volCaps) { | |||
return nil, status.Error(codes.InvalidArgument, "Volume capabilities not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest retrieving the access modes defined in the passed in volume capabilities and logging that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that probably makes sense to do, I'll add a TODO for a future PR
) | ||
|
||
var ( | ||
volumeCaps = []csi.VolumeCapability_AccessMode{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it into controller.go, similar to how EBS does it: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/controller.go#L44
@@ -0,0 +1,6 @@ | |||
package driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file only has one constant. Suggest moving in constants at your discretion, or removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan to add the startup taint key to constants.go in a separate PR.
Example from EBS CSI Driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/constants.go#L172-L175
Not opposed to removing constants.go, but I feel comfortable leaving it in right now
return nil, status.Error(codes.Aborted, msg) | ||
} | ||
defer d.inFlight.Delete(volName) | ||
|
||
// create a new volume with idempotency | ||
// idempotency is handled by `CreateFileSystem` | ||
volumeParams := req.GetParameters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest using a switch statement here. Similar to:
for key, value := range req.GetParameters() {
switch strings.ToLower(key) {
case "fstype":
klog.InfoS("\"fstype\" is deprecated, please use \"csi.storage.k8s.io/fstype\" instead")
case VolumeTypeKey:
volumeType = value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an opportunity to clean up code in the repo similar to this change, I'll add a TODO but think it's out of scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Eddie's comment. We followed this implementation pattern for the OpenZFS CSI driver. I'd also recommend throwing an error on the create in the default cause to be consistent with the FSx API - if the user provides an invalid parameter in the storage class, they shouldn't expect to have a successful creation.
klog.V(5).InfoS("[Debug] Retrieving region from metadata service") | ||
metadata, err := cloud.NewMetadataService(cloud.DefaultEC2MetadataClient, cloud.DefaultKubernetesAPIClient, region) | ||
if err != nil { | ||
klog.ErrorS(err, "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing klog.FlushAndExit(klog.ExitFlushTimeout, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is currently the same behavior as EBS: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/controller.go#L88-L89 Should a similar change be made to the ebs csi driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panic also works here, I don't feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add a note about this for more research, it's possible we should just move all panics to this
csi.RegisterNodeServer(d.srv, d) | ||
default: | ||
return fmt.Errorf("unknown mode: %s", d.options.mode) | ||
} | ||
|
||
klog.Infof("Listening for connections on address: %#v", listener.Addr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Infof
-> klog.InfoS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this merits a separate commit/PR, because this change will need to be made in a lot of different places in the repo
@@ -104,3 +131,15 @@ func (d *Driver) Stop() { | |||
klog.Infof("Stopping server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Infof
-> klog.InfoS
/lgtm |
Is this a bug fix or adding new feature?
new feature + QOL changes
What is this PR about? / Why do we need it?
This PR is about bring the FSx for Lustre CSI Driver closer in parity with the EBS CSI Driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver to improve the repo's best practices and allow the start up taint change to be adopted within this driver as well. These changes include:
Note: will update e2e dependencies in future PR
What testing is done?
Manual testing (built driver image and tested dynamic provisioning + mounting a file system)
sanity tests and unit tests
e2e testing (in this PR)