Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting Multiple CSI Driver improvements #309

Merged
merged 8 commits into from
May 16, 2023

Conversation

jacobwolfaws
Copy link
Contributor

@jacobwolfaws jacobwolfaws commented May 11, 2023

Is this a bug fix or adding new feature?
new feature + QOL changes

What is this PR about? / Why do we need it?
This PR is about bring the FSx for Lustre CSI Driver closer in parity with the EBS CSI Driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver to improve the repo's best practices and allow the start up taint change to be adopted within this driver as well. These changes include:

Note: will update e2e dependencies in future PR

What testing is done?

  • Manual testing (built driver image and tested dynamic provisioning + mounting a file system)

    • set json logging to confirms logged in json format properly
    • controller set with different mode values
    • confirmed that the driver image is built with the proper environment data
  • sanity tests and unit tests

  • e2e testing (in this PR)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2023
@k8s-ci-robot k8s-ci-robot requested a review from khoang98 May 11, 2023 18:04
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobwolfaws

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from olemarkus May 11, 2023 18:04
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 11, 2023
@jacobwolfaws
Copy link
Contributor Author

jacobwolfaws commented May 11, 2023

/retest

Seems like there's a transient error with one of the sanity tests


const (
VolumeOperationAlreadyExistsErrorMsg = "An operation with the given volume %s already exists"
SnapshotOperationAlreadyExistsErrorMsg = "An operation with the given snapshot %s already exists"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FSx for Lustre CSI driver does not support snapshots yet, so this constant is unused. I think it would be best practice to remove this for now, especially because snapshots/backups may not be implemented for a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed

flag "github.com/spf13/pflag"
)

func TestNodeOptions(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that you're adding this test for node options before node options is implemented. I'm assuming as-is these unit tests don't pass. Ideally this would be fixed, but doing so likely isn't the hassle given how these commits are stacked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an empty test, so I believe it passes. Either way I agree, I don't think it makes sense to change this given how the commits are stacked

flag "github.com/spf13/pflag"
)

func TestServerOptions(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, unless I'm missing something I don't think these tests would pass. I may be mistaken though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response :)

}

driver.controllerService = newControllerService(&driverOptions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we set these here instead of in the driver declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done like this to accommodate the different modes, the alternative would be to return a driver configured differently for each case. This is the format EBS uses + the one I prefer to that option

pkg/cloud/cloud.go Show resolved Hide resolved
return nil, fmt.Errorf("node providerID empty, cannot parse")
}

awsInstanceIDRegex := "s\\.i-[a-z0-9]+|i-[a-z0-9]+$"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This regex retrieves the ec2 instance id of the node from the provider id, right? If so, I'm partial to specifying ec2InstanceId in the names of these variables. Do you have context into what would be required to provide support for the CSI driver in non-EKS Kubernetes environments?

@@ -12,13 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

VERSION=v0.9.0
VERSION?=v0.9.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there more context behind this change? Is there any world in which VERSION would be already set before making this declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just makes sense to move us into alignment with the EBS CSI Driver. I don't think it makes a difference for us, but for consistency's sake we should change it

@@ -34,10 +34,15 @@ spec:
image: public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.9.0
imagePullPolicy: IfNotPresent
args:
# - --mode={all,controller,node} # specify the driver mode

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea as above, for my own understanding what is the default behavior when mode isn't specified? From what I can tell the switch statements are configured to throw an error. Why don't we specify the mode as controller here?

@@ -6,3 +6,7 @@
*.swp
bin/
*~

# Vendor dir
vendor/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart, we should do this for the OpenZFS CSI driver too

@jacobwolfaws jacobwolfaws force-pushed the master branch 2 times, most recently from c70efba to 9a13bb1 Compare May 15, 2023 21:33
@@ -71,10 +117,17 @@ func (d *Driver) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest)
return nil, status.Error(codes.InvalidArgument, "Volume capabilities not provided")
}

if !d.isValidVolumeCapabilities(volCaps) {
if !isValidVolumeCapabilities(volCaps) {
return nil, status.Error(codes.InvalidArgument, "Volume capabilities not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest retrieving the access modes defined in the passed in volume capabilities and logging that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that probably makes sense to do, I'll add a TODO for a future PR

)

var (
volumeCaps = []csi.VolumeCapability_AccessMode{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,6 @@
package driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only has one constant. Suggest moving in constants at your discretion, or removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to add the startup taint key to constants.go in a separate PR.
Example from EBS CSI Driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/constants.go#L172-L175

Not opposed to removing constants.go, but I feel comfortable leaving it in right now

return nil, status.Error(codes.Aborted, msg)
}
defer d.inFlight.Delete(volName)

// create a new volume with idempotency
// idempotency is handled by `CreateFileSystem`
volumeParams := req.GetParameters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest using a switch statement here. Similar to:

for key, value := range req.GetParameters() {
	switch strings.ToLower(key) {
	case "fstype":
		klog.InfoS("\"fstype\" is deprecated, please use \"csi.storage.k8s.io/fstype\" instead")
	case VolumeTypeKey:
		volumeType = value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an opportunity to clean up code in the repo similar to this change, I'll add a TODO but think it's out of scope for this PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Eddie's comment. We followed this implementation pattern for the OpenZFS CSI driver. I'd also recommend throwing an error on the create in the default cause to be consistent with the FSx API - if the user provides an invalid parameter in the storage class, they shouldn't expect to have a successful creation.

klog.V(5).InfoS("[Debug] Retrieving region from metadata service")
metadata, err := cloud.NewMetadataService(cloud.DefaultEC2MetadataClient, cloud.DefaultKubernetesAPIClient, region)
if err != nil {
klog.ErrorS(err, "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing klog.FlushAndExit(klog.ExitFlushTimeout, 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is currently the same behavior as EBS: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/controller.go#L88-L89 Should a similar change be made to the ebs csi driver?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic also works here, I don't feel strongly about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add a note about this for more research, it's possible we should just move all panics to this

csi.RegisterNodeServer(d.srv, d)
default:
return fmt.Errorf("unknown mode: %s", d.options.mode)
}

klog.Infof("Listening for connections on address: %#v", listener.Addr())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.Infof -> klog.InfoS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#change-log-functions

I think this merits a separate commit/PR, because this change will need to be made in a lot of different places in the repo

@@ -104,3 +131,15 @@ func (d *Driver) Stop() {
klog.Infof("Stopping server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.Infof -> klog.InfoS

@khoang98
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2023
@k8s-ci-robot k8s-ci-robot merged commit 20d5b32 into kubernetes-sigs:master May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants