Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add windows driver installation support #510

Closed
wants to merge 1 commit into from

Conversation

jingxu97
Copy link
Contributor

This PR adds windows driver support. It adds a windows base dir to
install base yaml files. It also adds a windows alpha kustomization
file. To install driver for windows, first set env
GCE_PD_DRIVER_VERSION=windows/alpha and run
deploy/kubernetes/deploy-driver-win.sh script.

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@k8s-ci-robot
Copy link
Contributor

@jingxu97: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 19, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jingxu97
To complete the pull request process, please assign msau42
You can assign the PR to them by writing /assign @msau42 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from saad-ali and verult May 19, 2020 18:29
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 19, 2020
@jingxu97
Copy link
Contributor Author

updated with the same base. This way is kind of tricky with using $patch: delete.
@msau42 PTAL

deploy/kubernetes/overlays/windows/alpha/gcepd.yaml Outdated Show resolved Hide resolved
args:
- --v=5
- --csi-address=unix://C:\\csi\\csi.sock
- --kubelet-registration-path=C:\\var\\lib\\kubelet\\plugins\\pd.csi.storage.gke.io\\csi.sock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linux paths don't need to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for windows path, I think.

deploy/kubernetes/overlays/windows/alpha/gcepd.yaml Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 26, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 26, 2020
@@ -15,21 +15,14 @@ spec:
# Host network must be used for interaction with Workload Identity in GKE
# since it replaces GCE Metadata Server with GKE Metadata Server. Remove
# this requirement when issue is resolved and before any exposure of
# metrics ports.
hostNetwork: true
# metrics ports. But hostNetwork is not working for Windodws, might be an issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: windows

Do we support workload identity on Windows? If so, then not being on host network may be problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. we might use a different way to workaround it. Currently we don't need to set it for windows.

deploy/kubernetes/base/node.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/base/node.yaml Show resolved Hide resolved
deploy/kubernetes/deploy-driver.sh Outdated Show resolved Hide resolved
deploy/kubernetes/overlays/linux/base/noderegistrar.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/overlays/linux/base/volumes.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/overlays/linux/dev/WARNING.md Outdated Show resolved Hide resolved
deploy/kubernetes/overlays/linux/alpha/WARNING.md Outdated Show resolved Hide resolved
@jingxu97 jingxu97 force-pushed the May/windows branch 4 times, most recently from 407013c to 2f223d3 Compare May 27, 2020 19:45
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 27, 2020
@jingxu97
Copy link
Contributor Author

/test pull-gcp-compute-persistent-disk-csi-driver-e2e

@jingxu97
Copy link
Contributor Author

comments are addressed. PTAL

@jingxu97
Copy link
Contributor Author

/test pull-gcp-compute-persistent-disk-csi-driver-e2e

- name: sys
mountPath: /sys
nodeSelector:
kubernetes.io/os: linux
volumes:
- name: registration-dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these volumes be removed too since they are linux specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, windows used it too, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will not override. just merging the two parts. But I think I need to remove lifecycle part similar to linux.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of keeping the volumes here, if this base spec doesn't have the corresponding volumeMounts? It may be easier to read if we keep both volumeMounts and volumes together in one file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I look at the node.yaml again, the most part is about volumes and volumemounts, which are quite different from linux and windows. I think it is more clean just deploy node.yaml in overlay layer, and remove it from base layer.

deploy/kubernetes/overlays/windows/dev/drivertest.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/overlays/windows/dev/tmp.yaml Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 28, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 28, 2020
@verult
Copy link
Contributor

verult commented Jun 2, 2020

/lgtm

Consider adding a release note for the new OS env var in deployment script.

And a nit: maybe change "base" overlay name to something like "common". It could be confusing which "base" is referred to in other overlays.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2020
- name: sys
mountPath: /sys
nodeSelector:
kubernetes.io/os: linux
volumes:
- name: registration-dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of keeping the volumes here, if this base spec doesn't have the corresponding volumeMounts? It may be easier to read if we keep both volumeMounts and volumes together in one file.

source "${PKGDIR}/deploy/common.sh"

ensure_kustomize

${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${DEPLOY_VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f -
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${OS}/${DEPLOY_VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the DaemonSet has an OS node selector, what do you think of deploying both Linux + Windows daemonsets in one overlay? Then this easily supports mixed OS nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by deploying in one overlay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean actually have 2 DaemonSets in the same overlay (and not have os-specific overlays), one for linux and one for windows. Then when you run "deploy-driver.sh", you actually deploy both linux + windows daemonsets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means we need 2 DaemonSets in the base, and if we end up changing something, we'll have to remember to update both DaemonSets.

But this way, we can support clusters that have mixed windows + linux nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer os-specific overlay for clear structure. Not only daemonset is different, also have different other things such as pod security policy.
For cluster with mixed windows and linux, with current script, user can set os twice and deploy it twice. Or I can update script to accept multiple OSes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if running the script twice will work. The script does things like delete/create the controller secret which should only be done once.

If you are planning to remove the node.yaml from the base anyway, and not have it share anything, then it may be simple to just have a node_linux.yaml and node_windows.yaml

Copy link
Contributor Author

@jingxu97 jingxu97 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated with changes similar to what you mentioned. The deployment env can take both os and version information, and also allow you to set multiple values. I tested the script and it can install drivers for a mixed cluster.
Some common objects might get applied/deleted twice in this case, but it is ok. Apply second time just give a message saying ... unchanged. For deletion, we have --ignore-not-found setting, so it won't give error messages.

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 2, 2020
@jingxu97 jingxu97 force-pushed the May/windows branch 2 times, most recently from 5099fda to a9123ee Compare June 3, 2020 02:27
subjects:
- kind: ServiceAccount
name: csi-gce-pd-controller-sa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this binding, because without it, when pod security policy admission controller is enabled, service account csi-gce-pd-controller-sa cannot create pod at all.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 3, 2020

I updated the PR. PTAL. Thanks!

@msau42
Copy link
Contributor

msau42 commented Jun 3, 2020

cc @mattcary

@jingxu97 jingxu97 force-pushed the May/windows branch 2 times, most recently from 9bb739c to 70e2c92 Compare June 4, 2020 00:29
This PR adds windows driver support. It adds a windows base dir to
install base yaml files. It also adds a windows alpha kustomization
file. To install driver for windows, first set env
NODE_OS=windows and GCE_PD_DRIVER_VERSION=alpha and run
deploy/kubernetes/deploy-driver.sh script.

This PR also reorgnize the dir structure for linux version. Now under
overlay, we have a linux and a windows dir. Under each of them, we have
alpha, stable, etc. Currently windows only has alpha version.
# See "special case". This will tolerate everything. Node component should
# be scheduled on all nodes.
tolerations:
- operator: Exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the controller may not get rescheduled if the Node becomes unhealthy. Is that what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will remove it. I added it because I see node has it. And my test setup has linux tainted.

@@ -0,0 +1,8 @@
kind: DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file can be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, fix it

@@ -0,0 +1,13 @@
kind: DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file can be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

@@ -0,0 +1,17 @@
kind: DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file can be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

@@ -0,0 +1,36 @@
apiVersion: kustomize.config.k8s.io/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to need multiple overlays for windows too (ie staging, stable, dev)?

The challenge I see with this structure is that we now have 2 places that we need to keep in sync when we update overlays. For example, if we make a update to staging, then we need to update both windows and linux staging overlays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with one overlay, we have both windows and linux together. What if user need to deploy with mixed OS and versions, e.g, linux stable, and windows dev etc? Or just because some versions are only available in linux, not windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need to have different overlays for dev/alpha/staging/stable and we can control the versions used in each. For example, the dev overlay could use stable versions for linux, and unstable versions for windows. And in terms of development flow, we can add Windows first to the dev overlay, then once we're ready, add it to staging, and then when we're ready to release, add it to stable.

Copy link
Contributor Author

@jingxu97 jingxu97 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I miss something here, so you think user will not need a mixed version for windows and linux. If they want to deploy stable version, it will be stable version for both linux and windows? But what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?

Copy link
Contributor Author

@jingxu97 jingxu97 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?

In the current layout, we have common things for both linux and windows in the base, and different things in overlay. So new changes, either common to linux and windows, which will go to base, or different from linux and windows, and we need to update them wherever applicable. So I don't see there is more work needed compared to one overlay which contains both linux and windows. Also there might be cases that some setup is only applicable for linux, and we might accidentally break windows setup if we don't pay attention to the windows part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking the structure again last night :), and realized this is more like a multi-stage process. So instead of base/overlay structure, I propose to use the following as multiple stages and use them as bases. I am working on this structure now and will update. Please let me know what you think about this structure.

  • cluster_controller_setup : includes cluster setup and controller yaml file

  • node_setup:
    * Linux: node and other related yaml files for Linux
    * Windows: node and other related yaml files for Windows

  • image_update
    * alpha:
    *. stable:
    *. staging:
    * dev:
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how this would work out but am curious to see. One thing to note is that alpha/staging/stable is not necessarily just different versions. Different features may be added depending on their maturity.

what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?

In terms of looking at the driver itself, I think we should consider both Linux + Windows parts of the driver to be part of one complete driver instead of 2 independent drivers. So considering that Windows is still alpha right now, I think it is fine if it uses the Linux alpha versions as well (which currently don't have any difference from stable). I don't think a user will want to have a mixed Linux + Windows cluster where the Linux part is stable and the Windows part is alpha. I would expect a user to have 2 separate clusters: a Linux-only cluster used for production, and an experimental Windows-only cluster.

Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?

I agree with one deployment and one place to define all the versions, it's easier to make sure the versions are in-sync with each other.

name: csi-gce-pd-node-psp
spec:
allowedHostPaths:
- pathPrefix: \var\lib\kubelet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to replace the current PSP with these rules, instead of add, which is not desirable if we're going to be running both linux + windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will try to make a merge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think it is better to have separate policy for linux and windows instead of merging

@k8s-ci-robot
Copy link
Contributor

@jingxu97: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2020
@jingxu97
Copy link
Contributor Author

opened #520 to replace this one.

@jingxu97 jingxu97 closed this Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants