-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Compliance Test Suite to automated build #353
Comments
I presume this is still a requirement @mandy-chessell @cmgrote @grahamwallis |
Yes please |
There's an an initial version of this already part of our Helm charts for other metadata repositories... Just set the following in the
(I think this is probably our best option, since it will require such an external repository to first exist -- probably not something that will ever be part of our automated Jenkins builds, particularly for proprietary / licensed repositories?) |
In addition to running the cts, the results should be shared in some way - for example through some kind of build artifact, so that a consuming organisation could refer back to the CTS results for a shipped release - as well as developers being able to see CTS results from each build run. These could then be linked to from release notes/top level readme |
For 1.2 I will plan to execute the CTS & post the results at, or with a link from, the GitHub releases page. Experience learnt will be used to help refine the requirements for 1.3 where automation will be targeted |
CTS is now looking good in 1.2, but for this release the run is done semi manually (ie via notebook). Automation will follow in a subsequent release |
I am starting to prototype some CI/CD definitions for automated helm chart deployment. I'll link the necessary PRs for the CI/CD definitions here. Some of the changes are in base egeria, others are done directly through azure pipelines. |
Signed-off-by: Nigel Jones <[email protected]>
Signed-off-by: Nigel Jones <[email protected]>
Signed-off-by: Nigel Jones <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
The charts worked well for release testing on graph & inmem (after updated for inmem & openshift). Next step would be to look at running automatically within a ci/cd workflow, probably triggered off a schedule We could take something like the KinD workflow from https://github.com/odpi/egeria-database-connectors/blob/main/.github/workflows/connector-test.yml - obviously there's more to do.. Also referenced lately during the 3.7 release #6341 |
A few simple things we could check
Better would be to properly map the tests to test suites ie much more fine grained, but this is likely substantial development CTS failures can take a while to investigate, so automation could pick up issues a lot quicker - for example by running daily. One concern is whether the 7GB agent running KinD would have enough resource to complete the tests |
An initial version of this is now being tested at https://github.com/planetf1/cts The github action will
Caveats
|
After 4.5 hours, the CTS is still running (even at the minimum size (5 vs 1) - even in memory (and graph takes longer) I set the job timeout to 5 hours (max for github is 6 - then job gets killed) We have 2 CPUs, 7GB ram but it may be we cannot 'fit' the cts into this resource. If not we need additional infrastructure - one of
Or we figure out how to speed up Egeria/CTS significantly. I'll check the config further & try and debug via ssh just in case of any errors, and extend timeout closer to 6h |
All worth looking at -- when I run locally these days it's against 20GB memory and 3 cores, and at a size of 2. I think it finishes within 3 hours or less (for XTDB). So my first hunch would be that 7GB and 2 cores is probably too small (7GB maybe the main culprit -- could it just be hitting a non-stop swapping scenario?) |
I usually run on a 3-6 x 16GB cluster ... Though often multiple instances in parallel (all the charts) I have run locally in around a 6-8GB but indeed this config may sadly be too small. I'm going to take a closer look if o can get an ssh session setup |
Two projects to setup github actions runners on a k8s cluster: https://github.com/evryfs/github-actions-runner-operator The latter is being taken over by github for native support actions/actions-runner-controller#2072 |
Investigated external runners -- but hit issues with KinD. commented in actions-runner-controller discussion. Reverted to debugging github runners. The following fragment assisted with debugging (see https://github.com/lhotari/action-upterm for more info):
The issue turned out to be that the strimzi operator pod was not starting due to failing to meet cpu constraints. This defaulted to '500m' (0.500 cpu units) which should have been ok. However ever 1m failed to schedule. this looks like a KinD issue, but overriding the min cpu to '0m' allowed the pods to schedule. This was needed for our own pods too. Added additional checks. For example:
This fragment simply loops until the pod matching expression exists. ( Then we can do
This will immediately return if the pod matching expression doesn't exist, which is why the above check is needed first. All of these checks don't help running the cts as such, but rather help report the current stage in the github actions job log. If CTS works we can revisit better approaches, custom actions etc. |
SUCCESSFUL test run -> https://github.com/planetf1/cts/actions/runs/3708502295 - ie tasks completed as successful results are attached to the job. WIll elaborate the job to do some basic checks of the results. |
Example output I'm experimenting with This is based on positive/negative evidence counts in the details cts results ie:
This returns a simple pass/fail - based on whether any assertions have failed. There are many other interpretations we could do of the data, and format the evidence, check for other exceptions in log. |
Added checks into latest pipeline. |
I have reverted the doubling of the retry count used during CTS after seeing run-times on the CTS automation exceed 6 hours. Analysis of the cts execution is needed, but perhaps we were hitting many more of these time limits than I'd expected during even successful execution. See #7314 -- need to test it through ci/cd to get an exact comparison |
I'm proposing to move my repo under odpi. Whilst no doubt we can make improvements, refactor, it's a starting point, and moving it will make it easier for others to use it, review test results, improve CTS, and improve our test infrastructure. |
Having backed of the timer increase, CTS is now running in 4-4.5 hours. will leave like this |
The development work for this is complete |
I suggest we add running the compliance test suite against the in-memory repository as part of
our once-daily build
At a later point in time we can consider if this could be run against other metadata repositories.
The text was updated successfully, but these errors were encountered: