Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(tests): Enable Kubernetes tests #1970

Merged
merged 43 commits into from
Mar 29, 2020
Merged

chore(tests): Enable Kubernetes tests #1970

merged 43 commits into from
Mar 29, 2020

Conversation

ktff
Copy link
Contributor

@ktff ktff commented Mar 3, 2020

Closes #1635.

  • Open an issue to extend TLS configuration to accept certificate as Vec. Necessary to enabletransforms::kubernetes::watch_client::kube_tests::watch_pod test.

  • Tests on CI are exhausting disk space. Something needs to be deleted.

@ktff ktff added the domain: tests Anything related to Vector's internal tests label Mar 3, 2020
@binarylogic binarylogic requested a review from a user March 3, 2020 14:39
@binarylogic binarylogic assigned ghost Mar 3, 2020
@binarylogic
Copy link
Contributor

Thanks @ktff, this looks great. We recently moved all of our build/test steps into this docker-compose.yml file. The idea is to simplify running tests locally.

@a-rodin should we move this to the docker-compose.yml file? Would you mind reviewing this to make sure it's in line with our future transition of other jobs.

steps:
- uses: actions/checkout@v2
- name: Build
run: cargo build --release --no-default-features --features "sources-kubernetes transforms-kubernetes sinks-console"
Copy link
Contributor

@binarylogic binarylogic Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could build Vector once for all of our tests, if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we should have a check job that just runs check which should be fast and good for quick feedback.

Then one test job that tests everything, hopefully we won't run into noise issues with running kube and all our docker containers together. This then would allow us to build vector once in test mode and cache those build artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could build Vector once for all of our tests, if possible.

Yep, sharing the binaries could really speed up the test harness execution. We probably should address this as a part of our Github Actions switch. We'll have an opportunity to apply some grand design to our CI flows.

I think in general we should have a check job that just runs check which should be fast and good for quick feedback.

I have some ideas on how to organize the jobs/steps/actions/workflows to make them much more compact and composable that what we have at Circle CI. And more maintainable, and efficient too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sharing the binaries could really speed up the test harness execution. We probably should address this as a part of our Github Actions switch.

I'll see if it can be achieved here, because it would half the execution time from 40min to 20min, and half the total work.

@LucioFranco
Copy link
Contributor

This config is much nicer than circles! Looking good!

@ghost
Copy link

ghost commented Mar 4, 2020

@ktff This looks great! However, during the migration to GitHub Actions we try to keep it possible to run all CI workflows locally. The approach is to add jobs as services to the docker-compose.yml file and set dependencies between them as targets in tests/Makefile. See #1647 and #1663 for more details on that.

Keeping entire workflow in docker-compose.yml file also has an additional benefit of making it easy to multiplex jobs sharing some intermediate artifacts into a single CI jobs, thus making it possible to avoid full recompilation of Vector for different checks.

@ktff
Copy link
Contributor Author

ktff commented Mar 4, 2020

@a-rodin Ok, I'll migrate the configuration to the docker-compose.yml.

@ktff ktff marked this pull request as ready for review March 23, 2020 10:07
ktff added 21 commits March 23, 2020 11:11
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
Signed-off-by: Kruno Tomola Fabro <[email protected]>
@ktff ktff force-pushed the enable_kube_tests branch from 90a904f to a0a8c71 Compare March 23, 2020 10:12
Signed-off-by: Kruno Tomola Fabro <[email protected]>
@ktff
Copy link
Contributor Author

ktff commented Mar 25, 2020

@a-rodin What do you think about having two ways of running the Kubernetes tests?

One in docker_compose and the other with github actions. It will be a bit more to maintain, but both can be optimized to run in there own respective cases. Otherwise, I'll need to butcher how it's being tested in docker-compose so it's runnable on github action. The main problem being that the disk is being exhausted, and the second that it's taking ~40min to build everything in sequence.

@binarylogic
Copy link
Contributor

Just to throw it out there. Would custom GH runners solve this? It's something we've been discussing.

@LucioFranco
Copy link
Contributor

Can we spin up a dedicated server that holds a kube cluster and run all our tests with both docker kubernetes feature flags? As long as we have a valid config file and have network access to the remote machine that is running kube it should work just fine?

@ktff
Copy link
Contributor Author

ktff commented Mar 27, 2020

Both cases would solve it, but with higher maintainance cost, and slower, than ~60 lines that are dedicated to running the tests on github actions.

As there is probably no significantly better option, I'll just add the separate config for github actions.

ktff added 2 commits March 27, 2020 14:54
Signed-off-by: Kruno Tomola Fabro <[email protected]>
@ktff ktff changed the title chore(testing): Enable Kubernetes tests chore(tests): Enable Kubernetes tests Mar 27, 2020
@ghost
Copy link

ghost commented Mar 27, 2020

@ktff I think that the approach with having different configs for docker-compose.yml and GH Actions is reasonable if there is no easy way to run everything from docker-compose.yml.

My point was that it is desirable to be able to easily test every feature of Vector locally, including the Kuberenetes-related ones. Once it is possible, I think it is fine to have the duplication if it is not feasible to avoid it.

@LucioFranco
Copy link
Contributor

I just want to note that we moved away from using kind because they were slow and we were hitting 500s. This is why I suggest that we don't include it via docker and tell users to use minikube. I think its reasonable for the more complex sinks to have a bit more of a harder way to run the tests. Even though running minikube is pretty easy. Not blocking this PR but just wanted to add that note.

@ktff
Copy link
Contributor Author

ktff commented Mar 27, 2020

Minikube was the first choice, and it still is on GA.

Kind indeed is slow, so its used with only 2 test threads to decrease the load.

@LucioFranco I do like the idea of users externally providing Kubernetes cluster. If kind again becomes a problem, this is probably what we should do.


Minikube unfortunately isn't reliably usable inside docker with none drive as there are various bugs with it that popup on different machines/OS. But once none driver becomes more stable we should revisit this.

@LucioFranco
Copy link
Contributor

Minikube unfortunately isn't reliably usable inside docker with none drive as there are various bugs with it that popup on different machines/OS. But once none driver becomes more stable we should revisit this.

I am still not following why we need it to run inside docker, users can setup virtualbox or kvm and for CI we should just spin up a small instance on aws that has a kube cluster. As long as we namespace things correctly multiple tests should be able to be run against that cluster. This should remove the need for docker?

@ktff
Copy link
Contributor Author

ktff commented Mar 27, 2020

I agree with you.

I mentioned that paragraf as a note to remember that minikube isn't used in the scripts only because none driver currently isn't stable. I'll add a line to remove the ambiguity.

Signed-off-by: Kruno Tomola Fabro <[email protected]>
@ktff
Copy link
Contributor Author

ktff commented Mar 27, 2020

To clarify things:

  • On github actions we are using minikube, and the testing on 3 versions of Kubernetes is done in ~22 min, out of which ~18 min is building. For:

    • v1.17.2
    • v1.14.10
    • v1.13.2 (There is as misspelling in the config this should have been v1.13.12)
  • Local testing is done with kind and it is sharing build artifacts with the rest o docker-compose infrastructure so most of the time it's just a vector build x 2 and then testing one Kubernetes version at a time. On my local machine it's ~9min. Most of that time goes into creating and destroying the cluster three times. For:

    • v1.17.0
    • v1.14.10
    • v1.13.12

Also, for local testing, I'll reduce testing threads to 1 as it seems that it is faster with kind. Kind really doesn't like multiple tests runing at once. With one thread local testing is done in ~6min (sample size: 1).

Signed-off-by: Kruno Tomola Fabro <[email protected]>
@ktff
Copy link
Contributor Author

ktff commented Mar 27, 2020

I'll gather some statistics on reliability of kind, if it's locally to flaky, then as @LucioFranco suggested we should enable the optional use of user provided cluster, in a follow up issue.

@ktff
Copy link
Contributor Author

ktff commented Mar 29, 2020

Regarding locally running tests, they were run 100 times, out of which 72 timed out on kind, others ended succesfully.

To alleviate pressure on kind:

  • tests are run only on Kubernetes v1.17.0
  • timeout was increased from 60s to 120s

as a result, all subsequent 30 test runs were succesful.

This should be sufficente for now, but I'll open an issue to enable usage of user provided cluster.


Regarding CI running tests, there is a bug on how we exclude our own logs from collection on Kubernetes v1.13 and lower. It was introduced with #1501. This is causing a log feedback loop and making the GA runners unresponsive. I'll disable the testing on v1.13 for now and open a separate issue for the bug.

Ironically, this manifested on minikube and not on kind because kind was slow enough for tests to end before the feedback got out of hand.

Signed-off-by: Kruno Tomola Fabro <[email protected]>
@binarylogic
Copy link
Contributor

Nice, thank you for digging into this and solving it.

@binarylogic
Copy link
Contributor

Regarding the bug: could you open an issue to represent fixing that? It sounds pretty important.

@ktff
Copy link
Contributor Author

ktff commented Mar 29, 2020

It is, I'm currently writing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: tests Anything related to Vector's internal tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-enable kubernetes tests in CI
4 participants