-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add windows driver installation support #510
Conversation
@jingxu97: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jingxu97 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
updated with the same base. This way is kind of tricky with using $patch: delete. |
args: | ||
- --v=5 | ||
- --csi-address=unix://C:\\csi\\csi.sock | ||
- --kubelet-registration-path=C:\\var\\lib\\kubelet\\plugins\\pd.csi.storage.gke.io\\csi.sock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linux paths don't need to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for windows path, I think.
deploy/kubernetes/base/node.yaml
Outdated
@@ -15,21 +15,14 @@ spec: | |||
# Host network must be used for interaction with Workload Identity in GKE | |||
# since it replaces GCE Metadata Server with GKE Metadata Server. Remove | |||
# this requirement when issue is resolved and before any exposure of | |||
# metrics ports. | |||
hostNetwork: true | |||
# metrics ports. But hostNetwork is not working for Windodws, might be an issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: windows
Do we support workload identity on Windows? If so, then not being on host network may be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. we might use a different way to workaround it. Currently we don't need to set it for windows.
407013c
to
2f223d3
Compare
/test pull-gcp-compute-persistent-disk-csi-driver-e2e |
comments are addressed. PTAL |
/test pull-gcp-compute-persistent-disk-csi-driver-e2e |
- name: sys | ||
mountPath: /sys | ||
nodeSelector: | ||
kubernetes.io/os: linux | ||
volumes: | ||
- name: registration-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these volumes be removed too since they are linux specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, windows used it too, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows seems to be overriding them? https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/510/files#diff-c6cce7581a0b791e1a6268807fdf0a3fR38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will not override. just merging the two parts. But I think I need to remove lifecycle part similar to linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of keeping the volumes here, if this base spec doesn't have the corresponding volumeMounts? It may be easier to read if we keep both volumeMounts and volumes together in one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I look at the node.yaml again, the most part is about volumes and volumemounts, which are quite different from linux and windows. I think it is more clean just deploy node.yaml in overlay layer, and remove it from base layer.
/lgtm Consider adding a release note for the new And a nit: maybe change "base" overlay name to something like "common". It could be confusing which "base" is referred to in other overlays. |
- name: sys | ||
mountPath: /sys | ||
nodeSelector: | ||
kubernetes.io/os: linux | ||
volumes: | ||
- name: registration-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of keeping the volumes here, if this base spec doesn't have the corresponding volumeMounts? It may be easier to read if we keep both volumeMounts and volumes together in one file.
deploy/kubernetes/delete-driver.sh
Outdated
source "${PKGDIR}/deploy/common.sh" | ||
|
||
ensure_kustomize | ||
|
||
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${DEPLOY_VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f - | ||
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${OS}/${DEPLOY_VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the DaemonSet has an OS node selector, what do you think of deploying both Linux + Windows daemonsets in one overlay? Then this easily supports mixed OS nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by deploying in one overlay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean actually have 2 DaemonSets in the same overlay (and not have os-specific overlays), one for linux and one for windows. Then when you run "deploy-driver.sh", you actually deploy both linux + windows daemonsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means we need 2 DaemonSets in the base, and if we end up changing something, we'll have to remember to update both DaemonSets.
But this way, we can support clusters that have mixed windows + linux nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer os-specific overlay for clear structure. Not only daemonset is different, also have different other things such as pod security policy.
For cluster with mixed windows and linux, with current script, user can set os twice and deploy it twice. Or I can update script to accept multiple OSes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if running the script twice will work. The script does things like delete/create the controller secret which should only be done once.
If you are planning to remove the node.yaml from the base anyway, and not have it share anything, then it may be simple to just have a node_linux.yaml and node_windows.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated with changes similar to what you mentioned. The deployment env can take both os and version information, and also allow you to set multiple values. I tested the script and it can install drivers for a mixed cluster.
Some common objects might get applied/deleted twice in this case, but it is ok. Apply second time just give a message saying ... unchanged. For deletion, we have --ignore-not-found setting, so it won't give error messages.
New changes are detected. LGTM label has been removed. |
5099fda
to
a9123ee
Compare
subjects: | ||
- kind: ServiceAccount | ||
name: csi-gce-pd-controller-sa | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this binding, because without it, when pod security policy admission controller is enabled, service account csi-gce-pd-controller-sa cannot create pod at all.
I updated the PR. PTAL. Thanks! |
cc @mattcary |
9bb739c
to
70e2c92
Compare
This PR adds windows driver support. It adds a windows base dir to install base yaml files. It also adds a windows alpha kustomization file. To install driver for windows, first set env NODE_OS=windows and GCE_PD_DRIVER_VERSION=alpha and run deploy/kubernetes/deploy-driver.sh script. This PR also reorgnize the dir structure for linux version. Now under overlay, we have a linux and a windows dir. Under each of them, we have alpha, stable, etc. Currently windows only has alpha version.
# See "special case". This will tolerate everything. Node component should | ||
# be scheduled on all nodes. | ||
tolerations: | ||
- operator: Exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the controller may not get rescheduled if the Node becomes unhealthy. Is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will remove it. I added it because I see node has it. And my test setup has linux tainted.
@@ -0,0 +1,8 @@ | |||
kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file can be removed now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, fix it
@@ -0,0 +1,13 @@ | |||
kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file can be removed now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it.
@@ -0,0 +1,17 @@ | |||
kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file can be removed now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it.
@@ -0,0 +1,36 @@ | |||
apiVersion: kustomize.config.k8s.io/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to need multiple overlays for windows too (ie staging, stable, dev)?
The challenge I see with this structure is that we now have 2 places that we need to keep in sync when we update overlays. For example, if we make a update to staging, then we need to update both windows and linux staging overlays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with one overlay, we have both windows and linux together. What if user need to deploy with mixed OS and versions, e.g, linux stable, and windows dev etc? Or just because some versions are only available in linux, not windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll still need to have different overlays for dev/alpha/staging/stable and we can control the versions used in each. For example, the dev overlay could use stable versions for linux, and unstable versions for windows. And in terms of development flow, we can add Windows first to the dev overlay, then once we're ready, add it to staging, and then when we're ready to release, add it to stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I miss something here, so you think user will not need a mixed version for windows and linux. If they want to deploy stable version, it will be stable version for both linux and windows? But what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?
In the current layout, we have common things for both linux and windows in the base, and different things in overlay. So new changes, either common to linux and windows, which will go to base, or different from linux and windows, and we need to update them wherever applicable. So I don't see there is more work needed compared to one overlay which contains both linux and windows. Also there might be cases that some setup is only applicable for linux, and we might accidentally break windows setup if we don't pay attention to the windows part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking the structure again last night :), and realized this is more like a multi-stage process. So instead of base/overlay structure, I propose to use the following as multiple stages and use them as bases. I am working on this structure now and will update. Please let me know what you think about this structure.
-
cluster_controller_setup : includes cluster setup and controller yaml file
-
node_setup:
* Linux: node and other related yaml files for Linux
* Windows: node and other related yaml files for Windows -
image_update
* alpha:
*. stable:
*. staging:
* dev:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how this would work out but am curious to see. One thing to note is that alpha/staging/stable is not necessarily just different versions. Different features may be added depending on their maturity.
what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?
In terms of looking at the driver itself, I think we should consider both Linux + Windows parts of the driver to be part of one complete driver instead of 2 independent drivers. So considering that Windows is still alpha right now, I think it is fine if it uses the Linux alpha versions as well (which currently don't have any difference from stable). I don't think a user will want to have a mixed Linux + Windows cluster where the Linux part is stable and the Windows part is alpha. I would expect a user to have 2 separate clusters: a Linux-only cluster used for production, and an experimental Windows-only cluster.
Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?
I agree with one deployment and one place to define all the versions, it's easier to make sure the versions are in-sync with each other.
name: csi-gce-pd-node-psp | ||
spec: | ||
allowedHostPaths: | ||
- pathPrefix: \var\lib\kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to replace the current PSP with these rules, instead of add, which is not desirable if we're going to be running both linux + windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will try to make a merge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I think it is better to have separate policy for linux and windows instead of merging
@jingxu97: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
opened #520 to replace this one. |
This PR adds windows driver support. It adds a windows base dir to
install base yaml files. It also adds a windows alpha kustomization
file. To install driver for windows, first set env
GCE_PD_DRIVER_VERSION=windows/alpha and run
deploy/kubernetes/deploy-driver-win.sh script.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: