-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Integration tests #2425
Fix Integration tests #2425
Conversation
- Adapted error comparison according to linter recommendation - Disabled noctx linting for http request where canceling makes no sense - Disabled nilerror linting where nil error is returned on purpose - Disabled makezero linter where slice is explicitly deepcopied
Previously the regex only allowed the copyright notice to contain the years 2018,2019,2020,2021, or 2022. This commit widens to regex to 20\d\d allowing any year in the range [2000-2099]
The existing setup for minikube is very complicated, replicating most of the setup steps for a full kubernetes cluster in an only partially supported minikube configuration (driver=none). Furthermore the existing setup has been broken for sometime, likely, at least in part due to the changes to CNI and CRI in recent kubernetes versions. Since what we actually need is only a running Kubernetes cluster on the node and access to a registry on localhost:5000, we can switch the extremely complicated minikube setup for a lightweight cluster using k3s. Minikube came with a default addon for running a registry on every node, but the same is not the case for k3s, instead we make use of the package helm controller and its HelmChart CR to deploy twuni/docker-registry.helm and expose it on localhost using the integrated LoadBalancer controller.
@imjasonh Some more Notes based on my trek through the integration tests:
|
@BronzeDeer Wow. Thanks for sticking with this, and especially for your thorough analysis!! I totally agree that we'd probably get better results by moving some tests to unit tests, or at least faster-running integration tests. Avoiding rebuilding the image every time definitely seems like an improvement. Making kaniko less painful to develop will pay dividends when/if we want to make improvements to it in the future. |
@imjasonh Right now even getting back to green tests will be a huge boon, enabling future work. ON that note, since we are blocked by the container-diff PR, should we consider a plan b in case the maintainers over there don't react soon? Here's my current understanding of the role that container-diff plays in the test workflow: The integration tests attempt to verify that kaniko creates "semantically equivalent" images to those build by docker. This means that ignoring outside variability (non-pinned package versions, anything involving RNGs, timestamps) the content (and metadata?) of the layers should be the same, even if the images are not bit-for-bit equivalent. Since we do not currently make the effort to control for these sources of variability (and docker afaik doesn't offer any timestamp-agnostic builds) we cannot currenlty go for the simpler bit-for-bit comparison using container hashes, and instead need the more sophisticated logic offered by container-diff. In the short term, kaniko could maintain a separate patched version of container-diff as needed, but If container-diff becomes fully unmaintained, I see a long-term alternative: We focus on creating reproducible test cases and rely fully on simple hash comparisons, also realizing many of the other benefits allong the way. This will require a complete refactor of all integration tests, understanding what the original bug was focussing and recreating the test case in a more low level, more unit test like fashion or creating a new dockerfile with fully pinned FROM images and apt/apk etc packages (although if we do understand the root cause of the problems, installing loads of packages will likely not be necessary). Finally we would have to create a small tool that could blank the timestamps in the layers after the fact, creating a canonical version from a tar dump. Once that is accomplished you wouldn't even need any remote registry and could simply cache targeted image digests. Adding new test cases would be as simple as calling a helper script to build using docker, canonicalizing timestamps and then saving the target digest. Integration tests would run kaniko, output to tar, then run the canonicalize tool and read out the digest to compare. Besides the obvious chore work of converting all existing test cases, this would put a load on the maintainers as well to properly screen new test cases for sources of variability. (Although I imagine existing dockerfile linters could be adopted to catch most common cases automatically. I think this could be a viable long-term goal, no matter whether we restore container-diff functionality in the short-term or not. If we get back to all green test we could institute a policy of requiring all new integration tests to follow the new approach and then slowly migrate over existing tests, over attempting a big bang change. As for short-term fixing the falling test, we could either pin an older (pre-OCI) ubuntu images as base in the dockerfile, or build and maintain or vendor a fixed container-diff. If we go with the older ubuntu image, we would need to prioritize a long-term fix since any tag rolling to OCI will create further sporadic test failures at random points in the future |
We can disable any integration tests that are affected by this in the meantime. I don't know if anybody is still maintaining container-diff, unfortunately. Let's just add a |
Pinning to an older Thank you again for your hard work and tenacity on this. This is likely going to end up being pretty thankless work in general, but I can give you all the thanks I have. Keep it up and we'll find a way to make you a maintainer* *this is a threat 😆 |
I added the older ubuntu pin. If the tests now pass and we can merge, I'll rebase the original reproducibility fix on top of it and we can get that out of the way as well |
Belay that, apparently the older ubuntu images have some execve problem with their shells. I'll look into it and pushed a fixed verson soon |
The dockerfile for the regression test connected to issue 684 used a rolling tag as base image, making it flaky and fail since it was introduced. This commit pins the base image to the digest of bionic-20200219, which, based on the date of the commit that introduced to the dockerfile would be the most newest ubuntu build and likely what the "rolling" tag resolved to back then. Since this also an image from the pre-oci days of ubuntu, this circumvents a bug in container-diff as well (GoogleContainerTools/container-diff#389)
Pushed a fix, all tests green on my fork now |
The existing setup for minikube is very complicated, replicating most of
the setup steps for a full kubernetes cluster in an only partially
supported minikube configuration (driver=none). Furthermore the existing
setup has been broken for sometime, likely, at least in part due to the
changes to CNI and CRI in recent kubernetes versions.
Since what we actually need is only a running Kubernetes cluster on the
node and access to a registry on localhost:5000, we can switch the
extremely complicated minikube setup for a lightweight cluster using
k3s. Minikube came with a default addon for running a registry on every
node, but the same is not the case for k3s, instead we make use of the
packaged helm controller and its HelmChart CR to deploy twuni/docker-registry.helm
and expose it on localhost using the integrated LoadBalancer controller.
This pull request also includes another commit with a small change to the boilerplate checking script.
The current implementation only allows for copyright notices from the years 2018-2022, and would therefore
reject the new 2023 notice on top of the newly introduced k3s setup script. The new commit widens the range of accepted copyright years to 2000-2099.
There was a single failing test in TestRun (issue_684), which was caused by container-diff not supporting OCI indices/images. The underlying dockerfile did not pin the container hash of the from image and in the meantime ubuntu moved to using OCI manifests. while a fix for the issue has been submitted, it is unclear whether container-diff is still maintained, instead, this PR also includes a small commit pinning the base image of the test back to what it likely was back when the dockerfile was submitted.
Based on #2400, which should be merged first
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Reviewer Notes