Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prow testing #9

Merged
merged 11 commits into from
Apr 3, 2019
Merged

Prow testing #9

merged 11 commits into from
Apr 3, 2019

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Mar 15, 2019

This contains some enabling work (like importing verify-shellcheck.sh from Kubernetes) and the actual Prow test script.

Image pushing is not implemented yet, only testing.

/cc @msau42

pohly added 2 commits March 15, 2019 11:08
In repos that have a test/e2e, that test suite should be run
separately because it depends on a running cluster.
This is an unmodified copy of kubernetes/hack/verify-shellcheck.sh
revision d5a3db003916b1d33b503ccd2e4897e094d8af77.
@k8s-ci-robot k8s-ci-robot requested a review from msau42 March 15, 2019 15:50
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 15, 2019
@pohly
Copy link
Contributor Author

pohly commented Mar 15, 2019

Some PRs which are getting tested by this:
kubernetes-csi/csi-driver-host-path#16 ("normal" E2E testing)
kubernetes-csi/cluster-driver-registrar#41 (custom E2E test suite)

That those currently fail is normal, because release-tools had to be modified to test the new prow.sh. That breaks the "test-subtree" check. It will be fixed by re-importing csi-release-tools once this PR here is merged.

My preference is to start the review and merging now, then later add image publishing. For that we are still blocked by kubernetes/k8s.io#158.

/test pull-sig-storage-csi-release-tools-kubernetes-1-13

prow.sh Outdated

# Go version to use. If the pre-installed Go is missing or a different
# version, this version here will get installed.
configvar CSI_PROW_GO_VERSION 1.11.4 "Go version"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this used to build? kind, and anything else? Would be good to mention in the comment what it's used for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily it is for building the component itself. But it also gets used to build kind (only if there is no pre-built binary available, so this should become the exception), the E2E test suite, and Kubernetes (when testing against a floating version like master or "latest release-1.13").

Hmm, this can indeed become problematic. My main concern were "go fmt" checks in the component itself, because those are known to be dependent on very specific Go versions. But Kubernetes building will soon fail for Go < 1.12. I'll change that so that each usage of Go can potentially have its own version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use the go version from the common makefile to build the component and check go fmt?

Is there a way we can dynamically find the go version needed to build each component so we don't have to hardcode it in our test scripts?


# Ginkgo runs the E2E test in parallel. The default is based on the number
# of CPUs, but typically this can be set to something higher in the job.
configvar CSI_PROW_GINKO_PARALLEL "-p" "Ginko parallelism parameter(s)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you set serial tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those can be part of the alpha test run, which runs sequentially.

I had considered having parameters for parallel, serial, and alpha, but then decided against it because the only difference between serial and alpha would have been the naming of the parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have 4 jobs:

  • parallel
  • serial
  • parallel alpha
  • serial alpha

It will make it easier to rule out if a test failure is due to an alpha feature or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add the ability to control which of these run.

But for now let's run them all inside a single job until a) Prow can have such a high number of jobs (right now I think it is still limited by the ConfigMap size) and b) we really know which of these are needed.

For example, there are no serial alpha jobs.

We have two reconstruction tests (one of which is skipped for hostpath but actually it only needs to be skipped for in-tree and not csi): https://k8s-testgrid.appspot.com/sig-storage-kubernetes#gce-serial&include-filter-by-regex=csi&include-filter-by-regex=CSI.*Volumes

No test is explicitly marked up as "Serial", and the "Disruptive" test cannot be run inside a kind cluster ("No external address for pod pod-subpath-test-csi-hostpath-dynamicpv-4lsw on node csi-prow-control-plane") - or least that is my initial impression. Need to check this error in more detail.

prow.sh Outdated
#
# TODO (?): support also an E2E test suite in the current repository.
# TODO: upstream Kubernetes
configvar CSI_PROW_E2E_REPO https://github.com/pohly/kubernetes "E2E repo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what else needs to be done to be able to use upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing anymore, I'll change this to upstream.

prow.sh Outdated
# TODO (?): support also an E2E test suite in the current repository.
# TODO: upstream Kubernetes
configvar CSI_PROW_E2E_REPO https://github.com/pohly/kubernetes "E2E repo"
configvar CSI_PROW_E2E_VERSION storage-external-snapshot "E2E version"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this, the branch name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch name, revision or tag. Basically anything that git can check out.

prow.sh Outdated
# separated by underscore.
configvar CSI_PROW_E2E_PARALLEL_FOCUS 'External.Storage.*csi-hostpath' "parallel E2E focus"
# Raw block was an alpha feature in 1.13.
configvar CSI_PROW_E2E_PARALLEL_SKIP_1_13 '|\[Testpattern: Dynamic PV .block volmode.\]' "additional parallel test skip for Kubernetes 1.13"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pull the e2e binary from the 1.13 branch, then you should be able to skip just on the [Feature] tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

But this approach has the potential advantage that new tests can be developed in Kubernetes master and then be applied also when testing against older releases - if it makes sense and the tests pass there, of course.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have wasted many debugging hours due to accidentally running new tests against older releases that I don't really want to support that here. We should be getting out of the habit of adding tests post-release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. But the definition of which tests to run is fixed and controlled by the component under testing. So if a PR with a test configuration change shows failures, then that should send a strong message that perhaps that test configuration change is the root cause. It will also be easy to verify whether those failures also occur in other PRs or master.

Regarding "adding tests post-release": consider that a bug is found in a sidecar. We fix that bug and add a test to the E2E suite in master which exposes that bug. Now we backport that sidecar bug fix into an older release which targets a stable Kubernetes release, like 1.13. How do we ensure that the backport actually works if we are limited to running the E2E test suite from 1.13 which doesn't have the new test?

If the bug can be exposed by a unit test in the component or if the component had its own E2E test suite (like the one I'm adding to cluster-driver-registrar in kubernetes-csi/cluster-driver-registrar#41), then we could backport the test together with the fix in a single PR. But that's a lot of ifs, and I am not sure what the Kubernetes release team thinks about backporting tests. Even if they allow it, there is a cyclic dependency: we first have to backport the fix without automatically checking it, do a bugfix release, backport the Kubernetes test such that it uses the new bugfix release, then update the component to use the backported fix. Most likely we'd simply skip testing the fix, or only do it once manually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why specific sidecars would run different K8s e2e tests from each other. Why can't they just all run "all K8s CSI tests"? A single sidecar is just one piece in the entire driver and plays one part in the entire K8s volume user experience. Now, we could potentially try to optimize and say things like "well, external-provisioner only deals with dynamic provisioning, so we only need to run dynamic provisioning tests", but there could end up being functionality that the external-provisioner does that will also impact other functionality not strictly related to dynamic provisioning. If we filter out specific tests now, then we're not getting full coverage of CSI, and we can easily miss new use cases that get added in the future.

Regarding the scenario of making a bug fix, I think it is reasonable to cherry-pick the new test to 1.13. There's no great solution to solve the chicken-and-egg problem except to make the fix in the sidecar first, cut an rc, and use the rc in the K8s test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to add tests that don't test the entire driver and only pieces of it (ie just the one sidecar + driver container), then that's more of an integration test rather than a K8s e2e test.

Even if that case, I would not want to encode sidecar-specific test details into a common repository. It should come from configuration stored per-repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why specific sidecars would run different K8s e2e tests from each other.

They don't. That's why they'll all share the same config.

But from a design perspective, the configuration is owned by the component, not by the CI. The CI just provides the runtime environment and enough information for the component to identify key differences between different jobs, like Kubernetes version.

The advantage of this approach is that changes to the configuration go through the same process as code changes, i.e. PR review and testing. Changes on the CI side are minimal and hopefully low risk. For example, switching to a new container image can't break Go compilation because the component itself decides whether it uses the pre-installed Go version.

If you want to add tests that don't test the entire driver and only pieces of it (ie just the one sidecar + driver container), then that's more of an integration test rather than a K8s e2e test.

Even if that case, I would not want to encode sidecar-specific test details into a common repository. It should come from configuration stored per-repo.

Fully agreed. The actual config in the repo will then have both, the common K8S E2E tests and some additional per-component E2E tests.

prow.sh Outdated
configvar CSI_PROW_E2E_PARALLEL_FOCUS 'External.Storage.*csi-hostpath' "parallel E2E focus"
# Raw block was an alpha feature in 1.13.
configvar CSI_PROW_E2E_PARALLEL_SKIP_1_13 '|\[Testpattern: Dynamic PV .block volmode.\]' "additional parallel test skip for Kubernetes 1.13"
configvar CSI_PROW_E2E_PARALLEL_SKIP "\\[Feature:|\\[Disruptive\\]$(eval echo "\${CSI_PROW_E2E_PARALLEL_SKIP_${csi_prow_kubernetes_version_suffix}}")" "parallel E2E skip"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No [Serial]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahem, good catch. Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced the distinction between serial/parallel and stable/alpha and made it possible to run any of these combinations in a job (from one per job to all four).

configvar CSI_PROW_E2E_PARALLEL_SKIP_1_13 '|\[Testpattern: Dynamic PV .block volmode.\]' "additional parallel test skip for Kubernetes 1.13"
configvar CSI_PROW_E2E_PARALLEL_SKIP "\\[Feature:|\\[Disruptive\\]$(eval echo "\${CSI_PROW_E2E_PARALLEL_SKIP_${csi_prow_kubernetes_version_suffix}}")" "parallel E2E skip"

# After the parallel E2E test without alpha features, a test cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of running in the same job, would it be simpler to have 1 job for non-alpha, and 1 job for alpha? Then they can also run in parallel, and it's easier to add more jobs in the future, like serial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be supported by allowing CSI_PROW_E2E_PARALLEL_FOCUS to be none. Then a Prow job which is only for alpha features can disable the parallel tests and vice versa.

Will add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add that I am not sure whether it is desirable to run these tests in parallel in different jobs.

By parallelizing, the time to completion of all tests goes down. But it wastes resources in the test cluster because the job and test cluster setup needs to be done multiple times. It's a tradeoff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to have more jobs run in parallel. That's why prow has boskos for project lending. Some of these serial tests takes hours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the storage test suite. I don't think we actually have test marked up as "Serial" at the moment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two reconstruction tests (one of which is skipped for hostpath but actually it only needs to be skipped for in-tree and not csi): https://k8s-testgrid.appspot.com/sig-storage-kubernetes#gce-serial&include-filter-by-regex=csi&include-filter-by-regex=CSI.*Volumes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a job where all four combinations of stable/alpha and serial/parallel were active: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_csi-driver-host-path/16/pull-sig-storage-csi-driver-host-path-kubernetes-1-13/8

Most of the tests are parallel, one is serial (and fails because of the missing external pod IP), two are parallel+alpha, none are serial+alpha.

That empty test run for serial+alpha is cheap in this case, just a few seconds.

I'm now doing some post-processing of the JUnit XML files to filter out the vast numbers of unrelated tests. That makes loading in Spyglass quite noticably faster. There are also no duplicate entries for "skipped tests", so the 77/163 Tests Passed! and 84/163 Tests Skipped. should be accurate.

# kubernetes-csi components must be updated, either by disabling
# the failing test for "latest" or by updating the test and not running
# it anymore for older releases.
configvar CSI_PROW_E2E_ALPHA_GATES_1_13 'VolumeSnapshotDataSource=true,BlockVolume=true,CSIBlockVolume=true' "alpha feature gates for Kubernetes 1.13"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular CI Jobs, all of these config params are set in the .yaml in test-infra job config. It seems easier to extend/add more jobs that way.

For example, if we want to add a 1.14 job in the future, then we would need to modify the config here, and then sync it to all the sidecars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start adding too many settings in the test-infra .yaml files, then we end up with a tight coupling between Prow job settings and what is getting tested. We may need different Prow jobs settings for different components. I think this will become complicated and fragile quickly.

The overall design is that the prow job just communicates the intent ("test against Kubernetes 1.14") and the component itself defines what that means.

Here's how testing against 1.14 can be rolled out:

  • define the new Prow job without running it automatically
  • update the components, verify that they pass the new Prow job by triggering it manually
  • once all components are done, activate the Prow job

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to have custom focuses/skips for K8s e2es per sidecar. The goal is to have uniform testing across all our sidecars and be able to test all the interaction together.

Right now with this current approach, to add a 1.14 job, we need to:

  1. Add a new optional prow job in test-infra
  2. Update the config here
  3. Update the release-tools in every single sidecar
  4. Test the job in all the sidecars
  5. Enable the prow job by default in test-infra

If we define the parameters in the prow config, then the number of prs and steps are reduced:

  1. Add the new optional prow jobs to test-infra
  2. Test the jobs in all the sidecars (or just enable the CI jobs immediately and see which sidecars fail)
  3. Enable the prow job by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to test the new PR job, you first need to have a pending PR in all the sidecars. For the less active repos there might not be one, so one would first have to create a fake one. Or can Prow run the test against an already merged PR? No, doesn't look like it does although it theoretically could (kubernetes-csi/csi-driver-host-path#26 (comment)).

That's still a lot of manual work. In my proposal, creating the PR can be automated (#7). My colleague @avalluri is currently looking into that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can enable CI jobs on the sidecars. In the ideal case they should pass because we're already testing against canary. In the worst case, we'll know if something fails and we can make fixes in the appropriate repos. Then once it passes, we can enable pull jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Monday in the CSI standup meeting, we can defer this decision until later, because all settings inside csi-release-tools are just defaults. If it turns out to be desirable to have all settings in test-infra, then that can be done by setting env variables in the Prow job spec.

In the meantime I'll continue with the current approach, because it is more suitable for rapid prototyping (no need to change the job spec to try out something new).

@@ -0,0 +1,839 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of boilerplate we're copying over. Do we really need all this? Shouldn't kubetest be able to handle cluster bringup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked about kubetest, and the answers weren't encouraging. Using it outside of the kubernetes/kubernetes repo is untested. It's also a legacy project which will get replaced with a kubetest2 rewrite. I'd rather not rely on it.

Regarding util.sh, that's actually not needed for cluster bringup. It's for verify-shellcheck.sh.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to keep this up to date if kubernetes changes it? Could we potentially vendor this in instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cannot vendor shell scripts, and even if we could, the version from Kubernetes assumes that it runs in k/k, so a verbatim copy wouldn't work for us. In practice I expect that we'll just keep using the current copy unmodified until something breaks and once that happens, we can see if it is something that has been fixed upstream and manually apply the fix.

@pohly pohly force-pushed the prow branch 2 times, most recently from 8d52ba1 to f0d453e Compare March 26, 2019 15:06
@pohly
Copy link
Contributor Author

pohly commented Mar 26, 2019

@msau42 I pushed some additional commits, please have another look. Before merging I intend to squash.

@pohly
Copy link
Contributor Author

pohly commented Mar 28, 2019

I started to break out some pieces into PRs that can be merged already now:

@pohly pohly force-pushed the prow branch 6 times, most recently from b76fde3 to 137e5c3 Compare March 28, 2019 16:01
# When running on a multi-node cluster, we need to figure out where the
# hostpath driver was deployed and set ClientNodeName accordingly.

# The content of this file depends on both what the E2E suite expects and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go in a subdirectory with the per version hostpath deployment files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because it's content also depends on the E2E suite, and the example deployment files know nothing about that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the hostpath driver keeps multiple versions of its test config depending on the e2e test version? I want to move as much driver-specific, or job-specific configuration out of this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The E2E test suite could be a certain revision instead of a tagged version, so the mechanism for picking a certain file from the csi-driver-host-path repo or a script inside that repo which outputs the content could become quite complicated.

Keeping it here has the advantage that the test driver config can be updated in lockstep with updating the CSI_PROW_E2E variables.

I would prefer to defer this "move configuration out of this script".

prow.sh Outdated
# limitations under the License.


# This script runs inside a Prow job. It runs unit tests ("make test")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have separate jobs for unit tests vs e2e tests, and also a separate job for alpha tests, similar to how k/k separates them.

Looking at the example test output here: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_csi-driver-host-path/16/pull-sig-storage-csi-driver-host-path-kubernetes-master/1111649510662082560

  • Mixing unit tests with e2e tests adds cognitive overhead to Kubernetes developers used to the way k/k separates tests
  • Having both master and 1.13 jobs run unit tests doesn't make sense since the unit tests aren't actually running against a specific Kubernetes release.
  • We are always adding more tests, including stress tests that may take longer. Separating them into separate jobs allows someone to get quick feedback on the unit tests without having to wait for the e2es to complete.
  • Alpha features have no backwards compatibility guarantees and will be likely to break when we move alpha features to beta. We have a chicken and egg problem with updating the alpha test in k/k and releasing sidecars with the fix. For this reason, alpha tests should be an optional job and not be PR blocking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what is the prow limit on the number of jobs? As an example, I see 20 jobs defined here. I don't think we will need 20 jobs in a single repo for our sidecars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have separate jobs for unit tests vs e2e tests, and also a separate job for alpha tests, similar to how k/k separates them.

Agreed. This is something that can be decided in test-infra by setting CSI_PROW_TESTS. I can change the wording here to make that clearer.

I don't agree with one argument though:

We have a chicken and egg problem with updating the alpha test in k/k and releasing sidecars with the fix. For this reason, alpha tests should be an optional job and not be PR blocking.

This is not quite correct. The way how the CI system is currently set up ensures that the individual component is under full control of what tests run. When making a code change that breaks testing of an alpha feature with an older Kubernetes, the component can disable that test for that Kubernetes version as part of the PR.

This only becomes a problem when separating configuration from code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what is the prow limit on the number of jobs?

I have no idea whether there is such a limit.

prow.sh Outdated
# If the pre-installed Go is missing or a different
# version, the required version here will get installed
# from https://golang.org/dl/.
configvar CSI_PROW_GO_VERSION_BUILD 1.11.4 "Go version for building the component" # depends on component's source code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this and the go version in travis could be defined from the same source so we can update it in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could grep it out of the travis.yml in the shell script.

# Kubernetes version to test against. This must be a version number
# (like 1.13.3) for which there is a pre-built kind image (see
# https://hub.docker.com/r/kindest/node/tags), "latest" (builds
# Kubernetes from the master branch) or "release-x.yy" (builds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever need to build Kubernetes from a release branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for example the ci-kubernetes-csi-1-14-on-kubernetes-1-13 job does that. Whether we want to cover that case is debatable. It can detect regressions in Kubernetes that are not detected in Kubernetes itself because the E2E testing there only uses a single deployment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we may need to change the GO version to match the Kubernetes supported go version for that release. There have been bugs in the past caused by using new go versions to build older Kubernetes releases (because go broke backwards compatibility)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

prow.sh Outdated
}

# Which tests are alpha depends on the Kubernetes version. The same
# E2E test suite is used for all Kubernetes versions, including older
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not make this assumption. This is going to break very quickly any time we move a feature from alpha to beta, and wastes many hours of developer time trying to debug what's wrong.

For beta and GA features, the e2e test version that is used must be <= the K8s version.
For alpha features, the e2e test version must == K8s version

We may need to special case 1.13 because the test framework was not completed until 1.14, but this should not be the norm. The other option is to leave 1.13->1.14 version skew testing to in-tree k/k, and just start the kubernetes-csi CI starting from 1.14.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree, but so be it. It's not hard to make the CSI_PROW_E2E settings depend on the Kubernetes version, I just don't have time for that right now. Can we defer this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't have time to do this, then I prefer we leave out skew testing for now, and have 1.14 against 1.14, and master against master, where the e2e test version is the same as the kubernetes version. Let's take incremental steps instead of trying to tackle every single scenario at once.

The way that is right now is going to require considerable effort next release to update the versions used in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it tomorrow. We are almost there, I don't want to go back.

pohly added 3 commits April 2, 2019 09:00
These are the modifications that were necessary to call this outside
of Kubernetes. The support for excluding files from checking gets
removed to simplify the script. It shouldn't be needed, because
linting can be enabled after fixing whatever scripts might fail the
check.
By default this only tests the scripts inside the "release-tools"
directory, which is useful when making experimental changes to them in
a component that uses csi-release-tools. But a component can also
enable checking for other directories.
This enables testing of other repos and of this repo itself inside
Prow. Currently supported is unit testing ("make test") and E2E
testing (either via a local test suite or the Kubernetes E2E test
suite applied to the hostpath driver example deployment).

The script passes shellcheck and uses Prow to verify that for future
PRs.
pohly added 4 commits April 2, 2019 20:53
The travis.yml is now the only place where the Go version for the
component itself needs to be configured.
Using the same (recent) Go version for all Kubernetes versions can
break for older versions when there are incompatible changes in Go. To
avoid that, we use exactly the minimum version of Go required for each
Kubernetes version. This is based on the assumption that this
combination was tested successfully.

When building the E2E suite from Kubernetes (the default) we do the
same, but still allow building it from elsewhere.

We allow the Go version to be empty when it doesn't matter.
While switching back and forth between release-1.13 and release-1.14
locally, there was the problem that the local kind image kept using
the wrong kubelet binary despite rebuilding from source. The problem
went away after cleaning the Bazel cache. Exact root cause unknown,
but perhaps using unique tags and properly cleaning the repo helps.

If not, then the unique tags at least will show what the problem is.
pohly added 2 commits April 3, 2019 13:54
Instead of always using the latest E2E tests for all Kubernetes
versions, the plan now is to use the tests that match the Kubernetes
version. However, for 1.13 we have to make an exception because the
suite for that version did not support the necessary
--storage.testdriver parameter.
The temporary fork was merged, we can use the upstream repo again.
@pohly
Copy link
Contributor Author

pohly commented Apr 3, 2019

@msau42 I've pushed some updates:

  • d87eccb prow.sh: switch back to upstream csi-driver-host-path
  • 6602d38 prow.sh: different E2E suite depending on Kubernetes version
  • 741319b prow.sh: improve building Kubernetes from source
  • 29545bb prow.sh: take Go version from Kubernetes source
  • 429581c prow.sh: pull Go version from travis.yml
  • 0a0fd49 prow.sh: comment clarification

I think that addresses your review comments.

@msau42
Copy link
Collaborator

msau42 commented Apr 3, 2019

/lgtm
Thanks this looks great!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit 95ae9de into kubernetes-csi:master Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants