Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Checking fix for dynamic provisioning tests: rebase 1.13.4 #22446

Closed
wants to merge 2 commits into from

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Apr 1, 2019

No description provided.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 1, 2019
@wongma7 wongma7 force-pushed the rebase-1.13.4-01 branch from fbf6a3b to c6214ec Compare April 2, 2019 15:36
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 2, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wongma7
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: smarterclayton

If they are not already assigned, you can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 2, 2019

well, now the drivers run fine, but of course the pods consuming the csi hostpath driver also run into permission errors... I believe the driver creates 777 directories in /tmp, but it seems ordinary pods can't read or write there, and this time I don't see selinux denials in workers-journal. Here's the 3 functions we create pods to try to consume the volumes in:

runInPodWithVolume
TestVolumeClient
CreateSecPodWithNodeName
@openshift/storage

@jsafrane
Copy link
Contributor

jsafrane commented Apr 3, 2019

It must be SELinux. Kubelet + CSI driver mounts a volume into /var/lib/kubelet/pods/5a8f8521-561a-11e9-bc56-06799de127b8/volumes/kubernetes.io~csi/pvc-552ab890-561a-11e9-bc56-06799de127b8/mount:

$ ls -Za /var/lib/kubelet/pods/5a8f8521-561a-11e9-bc56-06799de127b8/volumes/kubernetes.io~csi/pvc-552ab890-561a-11e9-bc56-06799de127b8/mount

system_u:object_r:container_file_t:s0:c217,c625 .

But then the client pod runs as system_u:system_r:container_t:s0:c0,c1.

@jsafrane
Copy link
Contributor

jsafrane commented Apr 3, 2019

The client pod was started with

    securityContext:
      fsGroup: 1000
      seLinuxOptions:
        level: s0:c0,c1

@jsafrane
Copy link
Contributor

jsafrane commented Apr 3, 2019

As a quick workaround / hack, we can use tmpfs as hostpath driver "persistent" directory:

SELinux should not block any container access to tmpfs. Still we should fix it somehow upstream.

@jsafrane
Copy link
Contributor

jsafrane commented Apr 3, 2019

Scratch the previous comment, I accidentally ran with SELinux permissive. tmpfs does not help at all, it's labeled with var_lib_t.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 3, 2019

@jsafrane Summary: solution would be to make the hostpath csi plugin create PVs on the host (hence hostpath!), not inside the container. It is very confusing because I cannot seem to figure out a stable image version + yaml combination yet. edit: mounting /var/lib/csi-hostpath-data to /tmp works, hostpath plugin v1.0.1 is hardcoded to use container's /tmp directory

The plugin is creating directories in /tmp in the container and /tmp inherits the labels of container fs, I think this is expected/okay.

[root@ip-10-0-167-116 ~]# ls -aZ /var/lib/containers/storage/overlay/4c9ea2a4d1ec7570587d79a939cee099caf5dca495ebc734556d243401367873/merged/
system_u:object_r:container_file_t:s0:c22,c662 .

I'm guessing the container runtime randomly chooses non-overlapping categories like c22,c662 to secure container filesystems from each other. If the pvc directory has c22,c662 then giving the pod c22,c662 will let it access it.

But the pvc directory won't have c22,c662 in the first place if the plugin creates it on the host. Why not? Then we need a version of the plugin with kubernetes-csi/csi-driver-host-path#20, which upstream is not using yet.

Also, even if we get a new hostpath plugin release containing kubernetes-csi/csi-driver-host-path#20, I think the yaml is a bit broken because the node registrar container is mounting it, not the plugin container https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/deploy/master/hostpath/csi-hostpath-plugin.yaml#L62

Will work on the above tomorrow

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2019
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 5, 2019

/test e2e-aws

2 similar comments
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 5, 2019

/test e2e-aws

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 8, 2019

/test e2e-aws

@wongma7 wongma7 force-pushed the rebase-1.13.4-01 branch from 078266d to 753b497 Compare April 8, 2019 16:28
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 8, 2019

/test e2e-aws

@wongma7 wongma7 force-pushed the rebase-1.13.4-01 branch from 753b497 to ea70b6c Compare April 8, 2019 17:03
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2019
@wongma7 wongma7 force-pushed the rebase-1.13.4-01 branch from ea70b6c to 3451d1d Compare April 8, 2019 17:05
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 9, 2019

/test e2e-aws

2 similar comments
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 9, 2019

/test e2e-aws

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 10, 2019

/test e2e-aws

@bertinatto
Copy link
Member

I've been running CSI tests in my local env and this is what I see when I describe:

  1. Attacher:
  Warning  FailedCreate  31s (x15 over 113s)  daemonset-controller  Error creating: pods "csi-hostpathplugin-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]
  1. Provisioner
 Warning  FailedCreate  16s (x20 over 3m1s)  statefulset-controller  create Pod csi-hostpath-provisioner-0 in StatefulSet csi-hostpath-provisioner failed error: pods "csi-hostpath-provisioner-0" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]
  1. Driver
  Warning  FailedCreate  2s (x16 over 2m46s)  daemonset-controller  Error creating: pods "csi-hostpathplugin-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]

Apparently the ServiceAccount used in the test doesn't have the privileged SCC. Once I do that (oc adm policy add-scc-to-user privileged -z csi-attacher) the pods are created normally.

I'm currently looking into how I can do that in the test.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

The test is supposed to create a PSP that gives access to privileged and all that https://github.com/kubernetes/kubernetes/blob/b0aee7fa3c834a47c5c0245e4074678df9387baa/test/e2e/framework/psp_util.go. Not sure how it works in relation to SCC's. Some tests are passing now, some flaking, some failing, and I don't see SELinux denials anymore at least.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

All the failures say something like

   hostpath: WAITING: CreateContainerError - Manifest does not match provided manifest digest sha256:0aa496f3e7ff7240abbf306e4244a75c5e59cbf2e4dbc246a6db2ca1bc67c6b1

I don't know what it means but since the plugin container can't start, no socket gets created, nobody can connect.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

registry bug maybe? kubelet says successfully pulled but then container can't start?

sha256:0aa496f3e7ff7240abbf306e4244a75c5e59cbf2e4dbc246a6db2ca1bc67c6b1 is quay.io/k8scsi/hostpathplugin:v0.4.1
sha256:f755dd34ac4b928be4fc21593094c0f67f8d00b7ab846c7e6282575fddf86ced is quay.io/k8scsi/hostpathplugin:v1.0.0

openshift-tests [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s] 5m33s

ip-10-0-170-145.ec2.internal

Apr 10 16:47:00.615: INFO: At 2019-04-10 16:42:02 +0000 UTC - event for csi-hostpathplugin-698xl: {kubelet ip-10-0-170-145.ec2.internal} Failed: Error: Manifest does not match provided manifest digest sha256:0aa496f3e7ff7240abbf306e4244a75c5e59cbf2e4dbc246a6db2ca1bc67c6b1

openshift-tests [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s] 5m38s

ip-10-0-170-145.ec2.internal

Apr 10 16:43:29.926: INFO: At 2019-04-10 16:38:29 +0000 UTC - event for csi-hostpathplugin-zpqpz: {kubelet ip-10-0-170-145.ec2.internal} Failed: Error: Manifest does not match provided manifest digest sha256:0aa496f3e7ff7240abbf306e4244a75c5e59cbf2e4dbc246a6db2ca1bc67c6b1

openshift-tests [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s] 5m31s

ip-10-0-149-210.ec2.internal

Apr 10 16:51:54.742: INFO: At 2019-04-10 16:46:56 +0000 UTC - event for csi-hostpathplugin-wlq9w: {kubelet ip-10-0-149-210.ec2.internal} Failed: Error: Manifest does not match provided manifest digest sha256:f755dd34ac4b928be4fc21593094c0f67f8d00b7ab846c7e6282575fddf86ced

openshift-tests [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s] 5m37s

ip-10-0-149-210.ec2.internal

Apr 10 16:37:51.341: INFO: At 2019-04-10 16:32:50 +0000 UTC - event for csi-hostpathplugin-czn5g: {kubelet ip-10-0-149-210.ec2.internal} Failed: Error: Manifest does not match provided manifest digest sha256:f755dd34ac4b928be4fc21593094c0f67f8d00b7ab846c7e6282575fddf86ced

openshift-tests [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] subPath should support readOnly file specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s] 5m33s

ip-10-0-149-210.ec2.internal

Apr 10 16:48:11.684: INFO: At 2019-04-10 16:43:11 +0000 UTC - event for csi-hostpathplugin-kcqfd: {kubelet ip-10-0-149-210.ec2.internal} Failed: Error: Manifest does not match provided manifest digest sha256:f755dd34ac4b928be4fc21593094c0f67f8d00b7ab846c7e6282575fddf86ced

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

Apr 10 16:32:50 ip-10-0-149-210 hyperkube[1020]: E0410 16:32:50.786929 1020 remote_runtime.go:191] CreateContainer in sandbox "c17e7224dd87686083ea0a71bd6f7c6eeb984c73cf5c6c511c5bbf07c46001ca" from runtime service failed: rpc error: code = Unknown desc = Manifest does not match provided manifest digest sha256:f755dd34ac4b928be4fc21593094c0f67f8d00b7ab846c7e6282575fddf86ced

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

e2e tests pulling from quay:

framework/test_context.go
315: flag.StringVar(&TestContext.CSIImageRegistry, "csiImageRegistry", "quay.io/k8scsi", "overrides the default repository used for hostpathplugin/csi-attacher/csi-provisioner/driver-registrar images")

storage/persistent_volumes-local.go
149: provisionerImageName = "quay.io/external_storage/local-volume-provisioner:v2.1.0"

apimachinery/aggregator.go
182: etcdImage := "quay.io/coreos/etcd:v3.2.24"

storage/utils/utils.go
341: Image: "quay.io/kubernetes_incubator/nfs-provisioner:v2.2.0-k8s1.12",

testing-manifests/storage-csi/hostpath/hostpath/csi-hostpathplugin.yaml
36: image: quay.io/k8scsi/hostpathplugin:v1.0.0

testing-manifests/storage-csi/hostpath/hostpath-v0/csi-hostpathplugin.yaml
36: image: quay.io/k8scsi/hostpathplugin:v0.4.1

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1669096

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

/test e2e-aws

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 11, 2019

confirmed known issue, was fixed but fix didn't show up somehow, sorry for spam https://bugzilla.redhat.com/show_bug.cgi?id=1698253

@openshift-ci-robot
Copy link

@wongma7: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-image-registry fbf6a3b link /test e2e-aws-image-registry
ci/prow/artifacts fbf6a3b link /test artifacts
ci/prow/e2e-aws-builds 3451d1d link /test e2e-aws-builds
ci/prow/e2e-aws-serial 3451d1d link /test e2e-aws-serial
ci/prow/images 3451d1d link /test images
ci/prow/verify 3451d1d link /test verify
ci/prow/e2e-aws 3451d1d link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bertinatto
Copy link
Member

The test is supposed to create a PSP that gives access to privileged and all that https://github.com/kubernetes/kubernetes/blob/b0aee7fa3c834a47c5c0245e4074678df9387baa/test/e2e/framework/psp_util.go. Not sure how it works in relation to SCC's. Some tests are passing now, some flaking, some failing, and I don't see SELinux denials anymore at least.

Yes, but it seems like it's not working with OpenShift. The test is adding the privileged SCC to the SA named csi-[provisioner | attacher]. When I do that manually (with the oc adm's -z option), the SA name pattern is different:

& oc get scc/privileged -o yaml
(...)
users:
- system:serviceaccount:e2e-tests-csi-volumes-kl2b4:csi-attacher

And the pod is scheduled correctly.

However, it seems like this is not the problem you're facing here.

@openshift-ci-robot
Copy link

@wongma7: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2019
@gnufied
Copy link
Member

gnufied commented May 23, 2019

@bertinatto how are you running the tests? The openshift e2e harness automatically adds a privileged SCC available to service account that is used for running e2es.

@bertinatto
Copy link
Member

bertinatto commented May 24, 2019

@gnufied, I think I was running k8s tests (from release-1.13 branch) against OpenShift, but apparently I should've used the openshift-tests binary...

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2019
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 21, 2019
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants