Proposal 002 - Run the CNCF project benchmark tests as part of the automated pipeline

This is step 2 from the automated pipeline to evaluate the carbon emissions of a CNCF project: Run benchmarking tests of the project (probably created for this purpose). See also step 1: Trigger and Deploy.

Tracking issue: #83
Implementation issue: #86

Authors

@locomundo
@nikimanoledaki
@AntonioDiTuri
@rossf7

Status

Approved

Summary

Motivation

This proposal is part of the pipeline automation of the Green Reviews tooling for Falco (and new CNCF projects in the future). Currently, we are using Flux to watch the upstream Falco repository and run the benchmark workflow (See definition below) constantly. For example, this benchmark job is set up as a Kubernetes Deployment that runs an endless loop of stress-ng, which applies stress to the kernel. Instead, this proposal aims to provide a solution for how to deploy the benchmark workflows only when they are needed.

Secondly, automating the way we run benchmark workflows in this pipeline will help to make the process easier and faster. It will enable both the WG Green Reviews and CNCF project maintainers to come up with new benchmark jobs and run them to get feedback faster.

Goals

Describe the actions to take immediately after the trigger and deployment of the CNCF project defined in Proposal 1
Describe how the pipeline should fetch the benchmarks either from this repository (cncf-tags/green-reviews-tooling) or from an upstream repository (e.g. Falco's falcosecurity/cncf-green-review-testing).
Describe how the pipeline should run the benchmarks through GitHub Actions for a specific project e.g. Falco
Communicate to CNCF Projects interested in a Green Review what is the structure they need to comply in the process of creating a new benchmark job
Provide modularity for the benchmark tests.

Non-Goals

Defining or designing the content of benchmark tests themselves, or assigning responsibility for who should write them.

Proposal

User Stories

CNCF project maintainer selects the right benchmark for their project

Since different CNCF projects need different benchmarks to reproduce the right metrics, as a project maintainer, I would like to select the benchmarks reproducing a k8s context as realistic as possible.

CNCF project maintainer creates a new benchmark for their project

If the available benchmarks are not enough to set a realistic context, I would like to create and run my own benchmark

Green Reviews maintainer helps to create a new benchmark test for a specific CNCF project

As a Green Reviews maintainer, I can help a CNCF project maintainers to define the Functional Unit of a project so that the project maintainers can create a benchmark test.

CNCF Project maintainer modifies or removes a benchmark test

As a project maintainer, I can edit or remove a benchmark test if it is in a repository owned by the CNCF project itself, or otherwise if it’s in the Green Reviews repository by making a pull request with the changes.

Risks and Mitigations

As with every design document, there are multiple risks:

Extensibility: At the moment Falco is the first and only project that requested a Green Review (very appreciated guinea pig 🙂). When other CNCF projects will request other Green Reviews, we will learn more and adapt the project as needed.
Scalability: Green Reviews contributors should empower and encourage CNCF project maintainers to create benchmark jobs. The right collaboration will enable Green Reviews maintainers to scale to multiple projects (cause they will not need to understand the deployment details of every project) while producing higher quality metrics (cause the project is set up by the experts).
Validity: this point is less trivial and also conflicting with the one above but worth mention. If every single project defines its own benchmarks how will it be possible to compare different Projects result? This needs deeper investigation that will be discussed in a separate proposal

Design Details

Definitions

There are different components defined here and shown in the following diagram.

---
title: Proposal 002 Run
---
stateDiagram-v2

    getLatestReleases: GetLatestReleases()
    projDispatch: DispatchProjects()
    k8sCluster: Equinix K8s Cluster (k3s)

    state "GH Workflow Falco" as falcoPipeline {
        falcoInstallManifests: DeployFalco()
        falcoDestroyManifests: UninstallFalco()
        falcoStartBenchmarking: DeployBenchmarking()
        falcoWaitBenchmarking: WaitBenchmarkingDuration()
        falcoEndBenchmarking: StopBenchmarking()

        falcoInstallManifests --> falcoStartBenchmarking: Start Synthetic Workload
        falcoStartBenchmarking --> falcoWaitBenchmarking: Wait duration of benchmark
        falcoWaitBenchmarking --> falcoEndBenchmarking: Destroy benchmarking resources
        falcoEndBenchmarking --> falcoDestroyManifests: Uninstall Falco
    }
    state "GH Workflow Project [1:N]" as projNPipeline {
        projNInstallManifests: DeployProject()
        projNDestroyManifests: UninstallProject()
        projNStartBenchmarking: DeployBenchmarking()
        projNWaitBenchmarking: WaitBenchmarkingDuration()
        projNEndBenchmarking: StopBenchmarking()

        projNInstallManifests --> projNStartBenchmarking: Start Synthetic Workload
        projNStartBenchmarking --> projNWaitBenchmarking: Wait duration of benchmark
        projNWaitBenchmarking --> projNEndBenchmarking: Destroy benchmarking resources
        projNEndBenchmarking --> projNDestroyManifests: Uninstall Project
    }

    state "(Github) CNCF Projects" as cncfProjs {
        falco: falcosecurity/falco
        project_[2]
        project_[N]
    }

    [*] --> getLatestReleases: Trigger Cron @daily
    getLatestReleases --> projDispatch: DetailOfProjects

    getLatestReleases --> cncfProjs: GET /releases/latest
    cncfProjs --> getLatestReleases: [{"tag"="x.y.z"},...]

    projDispatch --> falcoPipeline: POST /workflows/dispatch
    projDispatch --> projNPipeline: POST /workflows/dispatch


    falcoPipeline --> k8sCluster
    projNPipeline --> k8sCluster
    %% k8sCluster --> falcoPipeline
    %% k8sCluster --> projNPipeline
    state join_state <<join>>
    falcoPipeline --> join_state
    projNPipeline --> join_state

Loading

Let's recap some of the components defined in Proposal 1:

Green Reviews pipeline: the Continuous Integration pipeline which deploys a CNCF project to a test cluster, runs a set of benchmarks while measuring carbon emissions and stores the results. It is implemented by the workflows listed below.
Cron workflow: This refers to the initial GitHub Action workflow (described in proposal 1) and which dispatches a project workflow (see next definition), as well as a delete workflow to clean up the resources created by the project workflow.
Project workflow: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project and runs the benchmarks (see below). A project workflow can be dispatched more than once if there are multiple project variants/setups. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs.
Delete/cleanup workflow: This is the one to make sure that the resources created by the project workflow are deleted so the environments go back to the initial state.

This proposal adds the following components:

[new] Benchmark job: a GitHub Actions job that applies the benchmark manifest using kubectl apply -f, waits the duration of the benchmark and deletes the manifest resources with kubectl delete -f.
[new] Benchmark manifest: A YAML file with the Kubernetes resources such as Deployments that deploy the benchmarking workload.

The manifest URL and benchmarking duration are configured via the projects.json.

{
    "projects": [
        {
            "name": "falco",
            "organization": "falcosecurity",
            "benchmark": {
                "k8s_manifest_url": "https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/e93136094735c1a52cbbef3d7e362839f26f4944/benchmark-tests/falco-benchmark-tests.yaml",
                "duration_mins": 15
            },
            "configs": [
                "ebpf",
                "modern-ebpf",
                "kmod"
            ]
        }
    ]
}

Benchmark job

The benchmark job applies the manifest using kubectl. The functional unit test is time-bound in the case of Falco and scoped to 15 minutes. Therefore, we deploy this test, wait for 15 minutes, then delete the manifest to end the loop. The test steps depend on the functional unit of each CNCF project. The wait duration is configurable via the duration_mins field in the project.json.

The benchmark job is also responsible for deleting the manifests either after the wait duration or sooner if an error has occurred.

Benchmark manifest

At a bare minimum, the benchmark manifest must contain Kubernetes resources for what should run in the Kubernetes cluster and which namespace should be used. For example, the Falco project maintainers have identified that one way to test the Falco project is through a test that runs stress-ng for a given period of time. The steps are contained in a Deployment manifest which is directly applied to the community cluster using kubectl

Below are two use cases: the benchmark manifests may be defined in the Green Reviews repository or in a separate repository.

Use Case 1: Benchmark manifest is defined in the same repository (preferred)

Hosting the manifests in the Green Reviews repository is preferred for both simplicity and security. This is also preferred for generic benchmarks that can apply to multiple CNCF projects.

Use Case 2: Benchmark manifest is defined in a different repository

We want to accommodate different methods of setting up the tests depending on the CNCF project. Given this, the benchmark manifest could be defined in a different repository. In this case, the k8s_manifest_url would be, for example, https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/e93136094735c1a52cbbef3d7e362839f26f4944/benchmark-tests/falco-benchmark-tests.yaml.

Applying manifests from a different repository not controlled by Green Reviews is a potential security risk. See next section.

Versioning / Security

Manifests in project.json are pinned to a Git commit SHA rather than a branch such as main. This mitigates the risk that a malicious workload could be included in the benchmark manifest and ensures that any changes to the manifests are reviewed by one of the Green Reviews maintainers.

Authentication

Before the benchmark workflow is called on, we assume that the workflow already contains a secret with a kubeconfig to authenticate with the test cluster and Falco has already been deployed to it. It is required that the pipeline authenticates with the Kubernetes cluster before running the job with the test.

Drawbacks (Optional)

Alternatives

Here a list of the alternatives we considered:

calling benchmarks as reusable GitHub Actions workflows: was originally selected but calling workflows with the uses directive does not support using parameterized values.
mapping between benchmark manifests and CNCF projects: we have decided for a 1:1 relationship, every project will only have one benchmark manifest, again for simplicity. We could add support for 1:many in the future

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal-002-run.md

proposal-002-run.md

Proposal 002 - Run the CNCF project benchmark tests as part of the automated pipeline

Authors

Status

Table of Contents

Summary

Motivation

Goals

Non-Goals

Proposal

User Stories

Risks and Mitigations

Design Details

Definitions

Benchmark job

Benchmark manifest

Use Case 1: Benchmark manifest is defined in the same repository (preferred)

Use Case 2: Benchmark manifest is defined in a different repository

Versioning / Security

Authentication

Drawbacks (Optional)

Alternatives

Infrastructure Needed (Optional)

Files

proposal-002-run.md

Latest commit

History

proposal-002-run.md

File metadata and controls

Proposal 002 - Run the CNCF project benchmark tests as part of the automated pipeline

Authors

Status

Table of Contents

Summary

Motivation

Goals

Non-Goals

Proposal

User Stories

Risks and Mitigations

Design Details

Definitions

Benchmark job

Benchmark manifest

Use Case 1: Benchmark manifest is defined in the same repository (preferred)

Use Case 2: Benchmark manifest is defined in a different repository

Versioning / Security

Authentication

Drawbacks (Optional)

Alternatives

Infrastructure Needed (Optional)