Add Compliance Test Suite to automated build #353

planetf1 · 2018-11-06T16:21:41Z

I suggest we add running the compliance test suite against the in-memory repository as part of
our once-daily build

At a later point in time we can consider if this could be run against other metadata repositories.

planetf1 · 2019-07-19T11:24:59Z

I presume this is still a requirement @mandy-chessell @cmgrote @grahamwallis

mandy-chessell · 2019-07-19T12:22:08Z

Yes please

cmgrote · 2019-07-23T08:21:36Z

At a later point in time we can consider if this could be run against other metadata repositories.

There's an an initial version of this already part of our Helm charts for other metadata repositories... Just set the following in the values.yaml of the vdc chart (by default it's set to false), and it should create and configure a CTS instance for each repository being deployed as part of the chart:

# Egeria Conformance Test Suite - sets up to run against all Egeria repositories (if enabled=true)
cts:
  enabled: true

(I think this is probably our best option, since it will require such an external repository to first exist -- probably not something that will ever be part of our automated Jenkins builds, particularly for proprietary / licensed repositories?)

planetf1 · 2019-11-11T16:03:43Z

In addition to running the cts, the results should be shared in some way - for example through some kind of build artifact, so that a consuming organisation could refer back to the CTS results for a shipped release - as well as developers being able to see CTS results from each build run.

These could then be linked to from release notes/top level readme

planetf1 · 2019-11-19T12:31:58Z

For 1.2 I will plan to execute the CTS & post the results at, or with a link from, the GitHub releases page. Experience learnt will be used to help refine the requirements for 1.3 where automation will be targeted

planetf1 · 2019-11-29T08:35:10Z

CTS is now looking good in 1.2, but for this release the run is done semi manually (ie via notebook). Automation will follow in a subsequent release

planetf1 · 2019-12-16T10:12:01Z

I am starting to prototype some CI/CD definitions for automated helm chart deployment.
Initially this is to azure with a very limited sub as a POC, and initially with a basic notebook deployment (less moving parts), but will a) work on fuller sub b) add cts once some initial proof points are complete.

I'll link the necessary PRs for the CI/CD definitions here. Some of the changes are in base egeria, others are done directly through azure pipelines.

Signed-off-by: Nigel Jones <[email protected]>

github-actions · 2021-04-24T00:04:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions · 2021-06-29T00:17:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions · 2021-10-20T00:22:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

planetf1 · 2022-04-01T12:09:13Z

The charts worked well for release testing on graph & inmem (after updated for inmem & openshift).

Next step would be to look at running automatically within a ci/cd workflow, probably triggered off a schedule

We could take something like the KinD workflow from https://github.com/odpi/egeria-database-connectors/blob/main/.github/workflows/connector-test.yml - obviously there's more to do..

Also referenced lately during the 3.7 release #6341

planetf1 · 2022-12-12T09:36:57Z

A few simple things we could check

number of tests failed is 0
number of tests successful is > ???? (current number? just big? could be checked in)
no exceptions in audit log
profile results compare to baseline (which could be checked in)

Better would be to properly map the tests to test suites ie much more fine grained, but this is likely substantial development

CTS failures can take a while to investigate, so automation could pick up issues a lot quicker - for example by running daily.

One concern is whether the 7GB agent running KinD would have enough resource to complete the tests

planetf1 · 2022-12-14T12:05:42Z

An initial version of this is now being tested at https://github.com/planetf1/cts
Still debugging, but expected:

The github action will

run 2 parallel jobs - for inmem & graph
Install k8s within a container (KinD)
setup & install our cts chart
wait for results
capture and post the results as an attachment

Caveats

Manual trigger only (for testing)
'name' of job is based on connector which is a long name - needs parsing to something simpler
Personal repo - just to get started quickly (need to discuss in team where this belongs)
Need to consider scheduling - daily? Triggers? dependencies?
How to report / ensure results looked at - slack?
Need some simple analysis of the results for pass/fail (ie # tests, exceptions etc) (maybe split out from test)
Hardcoded to 3.14-SNAPSHOT (may benefit from a tag for latest release)

cc: @cmgrote @mandy-chessell @lpalashevski

planetf1 · 2022-12-14T18:05:45Z

After 4.5 hours, the CTS is still running (even at the minimum size (5 vs 1) - even in memory (and graph takes longer)
-> https://github.com/planetf1/cts/actions/runs/3695246683/jobs/6257391400

I set the job timeout to 5 hours (max for github is 6 - then job gets killed)

We have 2 CPUs, 7GB ram but it may be we cannot 'fit' the cts into this resource.

If not we need additional infrastructure - one of

An enterprise github account (can use larger github hosted runners)
External runners (need to deploy on our own infrastructure, then install github client code.). Could be k8s but needs resource/funding
skip github actions and use external resource directly - as above

Or we figure out how to speed up Egeria/CTS significantly.

I'll check the config further & try and debug via ssh just in case of any errors, and extend timeout closer to 6h

cmgrote · 2022-12-14T20:15:33Z

All worth looking at -- when I run locally these days it's against 20GB memory and 3 cores, and at a size of 2. I think it finishes within 3 hours or less (for XTDB).

So my first hunch would be that 7GB and 2 cores is probably too small (7GB maybe the main culprit -- could it just be hitting a non-stop swapping scenario?)

planetf1 · 2022-12-14T23:44:28Z

I usually run on a 3-6 x 16GB cluster ... Though often multiple instances in parallel (all the charts)

I have run locally in around a 6-8GB but indeed this config may sadly be too small.

I'm going to take a closer look if o can get an ssh session setup

planetf1 · 2022-12-15T08:15:33Z

Two projects to setup github actions runners on a k8s cluster:

https://github.com/evryfs/github-actions-runner-operator
https://github.com/actions-runner-controller/actions-runner-controller

The latter is being taken over by github for native support actions/actions-runner-controller#2072

planetf1 · 2022-12-15T19:19:54Z

Investigated external runners -- but hit issues with KinD. commented in actions-runner-controller discussion.

Reverted to debugging github runners. The following fragment assisted with debugging (see https://github.com/lhotari/action-upterm for more info):

       === debug
      - name: Setup upterm session
        uses: lhotari/action-upterm@v1
        with:
          ## limits ssh access and adds the ssh public key for the user which triggered the workflow
          limit-access-to-actor: true

The issue turned out to be that the strimzi operator pod was not starting due to failing to meet cpu constraints. This defaulted to '500m' (0.500 cpu units) which should have been ok. However ever 1m failed to schedule. this looks like a KinD issue, but overriding the min cpu to '0m' allowed the pods to schedule. This was needed for our own pods too.

Added additional checks. For example:

          until kubectl get pod -l app.kubernetes.io/name=kafka -o go-template='{{.items | len}}' | grep -qxF 1; do
          echo "Waiting for pod"
          sleep 1
          done

This fragment simply loops until the pod matching expression exists. (kubectl rollout status may also be useful)

Then we can do

          kubectl wait pods --selector=app.kubernetes.io/name=kafka --for condition=Ready --timeout=10m

This will immediately return if the pod matching expression doesn't exist, which is why the above check is needed first.

All of these checks don't help running the cts as such, but rather help report the current stage in the github actions job log.

If CTS works we can revisit better approaches, custom actions etc.

planetf1 · 2022-12-16T08:27:39Z

SUCCESSFUL test run -> https://github.com/planetf1/cts/actions/runs/3708502295 - ie tasks completed as successful

results are attached to the job.

WIll elaborate the job to do some basic checks of the results.

planetf1 · 2022-12-16T15:23:26Z

Example output I'm experimenting with

This is based on positive/negative evidence counts in the details cts results ie:

➜  graph ./cts-analyze.py
              Metadata sharing MANDATORY_PROFILE   CONFORMANT_FULL_SUPPORT [  71657 /      0 ]
              Reference copies  OPTIONAL_PROFILE            NOT_CONFORMANT [   8496 /     32 ]
          Metadata maintenance  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [  14126 /      0 ]
                 Dynamic types  OPTIONAL_PROFILE            UNKNOWN_STATUS [      0 /      0 ]
                 Graph queries  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [    528 /      0 ]
             Historical search  OPTIONAL_PROFILE     CONFORMANT_NO_SUPPORT [    530 /      0 ]
                Entity proxies  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   2759 /      0 ]
       Soft-delete and restore  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   2592 /      0 ]
                Undo an update  OPTIONAL_PROFILE     CONFORMANT_NO_SUPPORT [    406 /      0 ]
           Reidentify instance  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   2650 /      0 ]
               Retype instance  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [  16365 /      0 ]
               Rehome instance  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   1590 /      0 ]
                 Entity search  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [  62878 /      0 ]
           Relationship search  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   8253 /      0 ]
        Entity advanced search  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [  44800 /      0 ]
  Relationship advanced search  OPTIONAL_PROFILE   CONFORMANT_FULL_SUPPORT [   9312 /      0 ]

FAIL [246942/32]
➜  graph echo $?         
1

This returns a simple pass/fail - based on whether any assertions have failed.
It does not (yet?) compare to a baseline

There are many other interpretations we could do of the data, and format the evidence, check for other exceptions in log.
Having experimented, refactoring could be a lot neater.

planetf1 · 2022-12-16T15:56:19Z

Added checks into latest pipeline.
Set default container to 'latest'
Added schedule, daily

planetf1 · 2023-01-11T18:58:30Z

I have reverted the doubling of the retry count used during CTS after seeing run-times on the CTS automation exceed 6 hours. Analysis of the cts execution is needed, but perhaps we were hitting many more of these time limits than I'd expected during even successful execution.

See #7314 -- need to test it through ci/cd to get an exact comparison

planetf1 · 2023-01-12T00:07:40Z

I'm proposing to move my repo under odpi. Whilst no doubt we can make improvements, refactor, it's a starting point, and moving it will make it easier for others to use it, review test results, improve CTS, and improve our test infrastructure.

planetf1 · 2023-01-12T10:17:15Z

Having backed of the timer increase, CTS is now running in 4-4.5 hours. will leave like this

mandy-chessell · 2023-06-29T08:54:35Z

The development work for this is complete

planetf1 added enhancement New feature or request consumability Makes the software easier to use or understand. Includes docs, error messages, logs, API definitions labels Nov 6, 2018

planetf1 self-assigned this Jul 19, 2019

planetf1 added the cicd label Oct 18, 2019

mandy-chessell added this to the 2019.11 (1.2) milestone Oct 25, 2019

planetf1 mentioned this issue Oct 29, 2019

Extend docker-compose & k8s tutorial environmentsto support CTS #1892

Closed

planetf1 added the testing testing - including automation label Nov 1, 2019

planetf1 added the build-improvement Build improvements - maven, gradle, GitHub actions label Nov 11, 2019

planetf1 added functionality-call To discuss and agree during weekly functionality call(s) and removed enhancement New feature or request labels Nov 11, 2019

planetf1 modified the milestones: 2019.11 (1.2), 2019.12 (1.3) Nov 29, 2019

planetf1 referenced this issue in planetf1/egeria Dec 16, 2019

#353 Initial test deployment yaml for k8s chart verification

03861f4

Signed-off-by: Nigel Jones <[email protected]>

planetf1 referenced this issue in planetf1/egeria Dec 16, 2019

#353 correct parms

6dd2cbc

Signed-off-by: Nigel Jones <[email protected]>

planetf1 referenced this issue in planetf1/egeria Dec 17, 2019

#353 correct yaml indendation

af97840

Signed-off-by: Nigel Jones <[email protected]>

planetf1 mentioned this issue Dec 19, 2019

Automated testing/deployment pipeline - yaml corrections #2274

Merged

planetf1 modified the milestones: 2019.12 (1.3), 2020.01 (1.4) Dec 20, 2019

planetf1 modified the milestones: 2020.01 (1.4), 2020.02 (1.5) Jan 23, 2020

cmgrote removed the functionality-call To discuss and agree during weekly functionality call(s) label Feb 10, 2020

github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Apr 24, 2021

planetf1 removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Apr 28, 2021

github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jun 29, 2021

mandy-chessell removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jun 30, 2021

planetf1 removed the cicd label Aug 3, 2021

github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Oct 20, 2021

planetf1 added pinned Keep open (do not time out) and removed no-issue-activity Issues automatically marked as stale because they have not had recent activity. labels Oct 27, 2021

planetf1 mentioned this issue Feb 4, 2022

Test Requirements odpi/egeria-connector-hivemetastore#2

Closed

planetf1 mentioned this issue Dec 14, 2022

[BUG] CTS failures #7148

Closed

1 task

planetf1 removed their assignment May 15, 2023

mandy-chessell closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Compliance Test Suite to automated build #353

Add Compliance Test Suite to automated build #353

planetf1 commented Nov 6, 2018

planetf1 commented Jul 19, 2019

mandy-chessell commented Jul 19, 2019

cmgrote commented Jul 23, 2019

planetf1 commented Nov 11, 2019

planetf1 commented Nov 19, 2019

planetf1 commented Nov 29, 2019

planetf1 commented Dec 16, 2019

github-actions bot commented Apr 24, 2021

github-actions bot commented Jun 29, 2021

github-actions bot commented Oct 20, 2021

planetf1 commented Apr 1, 2022

planetf1 commented Dec 12, 2022

planetf1 commented Dec 14, 2022 •

edited

Loading

planetf1 commented Dec 14, 2022 •

edited

Loading

cmgrote commented Dec 14, 2022

planetf1 commented Dec 14, 2022

planetf1 commented Dec 15, 2022 •

edited

Loading

planetf1 commented Dec 15, 2022

planetf1 commented Dec 16, 2022

planetf1 commented Dec 16, 2022 •

edited

Loading

planetf1 commented Dec 16, 2022

planetf1 commented Jan 11, 2023

planetf1 commented Jan 12, 2023

planetf1 commented Jan 12, 2023

mandy-chessell commented Jun 29, 2023

Add Compliance Test Suite to automated build #353

Add Compliance Test Suite to automated build #353

Comments

planetf1 commented Nov 6, 2018

planetf1 commented Jul 19, 2019

mandy-chessell commented Jul 19, 2019

cmgrote commented Jul 23, 2019

planetf1 commented Nov 11, 2019

planetf1 commented Nov 19, 2019

planetf1 commented Nov 29, 2019

planetf1 commented Dec 16, 2019

github-actions bot commented Apr 24, 2021

github-actions bot commented Jun 29, 2021

github-actions bot commented Oct 20, 2021

planetf1 commented Apr 1, 2022

planetf1 commented Dec 12, 2022

planetf1 commented Dec 14, 2022 • edited Loading

planetf1 commented Dec 14, 2022 • edited Loading

cmgrote commented Dec 14, 2022

planetf1 commented Dec 14, 2022

planetf1 commented Dec 15, 2022 • edited Loading

planetf1 commented Dec 15, 2022

planetf1 commented Dec 16, 2022

planetf1 commented Dec 16, 2022 • edited Loading

planetf1 commented Dec 16, 2022

planetf1 commented Jan 11, 2023

planetf1 commented Jan 12, 2023

planetf1 commented Jan 12, 2023

mandy-chessell commented Jun 29, 2023

planetf1 commented Dec 14, 2022 •

edited

Loading

planetf1 commented Dec 14, 2022 •

edited

Loading

planetf1 commented Dec 15, 2022 •

edited

Loading

planetf1 commented Dec 16, 2022 •

edited

Loading