Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve package building and testing. #3753

Closed
niedbalski opened this issue Jul 9, 2021 · 19 comments · Fixed by #4432
Closed

Improve package building and testing. #3753

niedbalski opened this issue Jul 9, 2021 · 19 comments · Fixed by #4432

Comments

@niedbalski
Copy link
Collaborator

niedbalski commented Jul 9, 2021

Problem Description

The current workflow of building package is mostly manual. We have some automation testing on place, namely this workflow [0]
Publication isn't automated and we don't have a staging repository to test installs and upgrades to the release bucket.

Proposed solution

  1. Create a workflow based on [0] that builds the packages for all the support distributions and architectures.
  2. The workflow should publish the package artifacts for each tagged release in a staging repository in s3.
  3. Workflow 1) triggers a workflow that runs a series of verification testing on top of the staging repository for all the supported
    distributions and architectures. Sanity testing should include:
  • Package contents (sanity),
  • Install works (no dpkg/rpm fails)
  • Install series -1 -> upgrade -> process keeps running
  1. If workflow 3) succeeds, then an automated propagation of the staging packages should move the packages into the releases repository.

[0] https://github.com/fluent/fluent-bit/blob/master/.github/workflows/build-release.yaml

Known Limitations

  • Not known limitations
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Aug 9, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@patrick-stephens
Copy link
Contributor

Got a limited POC going now to do all this in a single repo with an action: pushes to S3 for packages and GHCR for images. These are all staging and then next stage is to test and "bless", i.e. release.

@patrick-stephens
Copy link
Contributor

Further discussion with @niedbalski has clarified a few things:

  • We want to use an S3 bucket for staging and releases
  • The staging one can just contain the builds by version, the release one needs to provide a full repository including old versions eventually.

@edsiper
Copy link
Member

edsiper commented Nov 30, 2021

@niedbalski @patrick-stephens

S3 should not be used for releases. Many users and customers have restricted access to S3 buckets and have whitelisted fluentbit domains to allow mirror the repos locally. We should continue using the native repos.

@niedbalski
Copy link
Collaborator Author

@edsiper @patrick-stephens

s3 can handle custom domains, the domain mapping shouldn't change, any existing whitelist related to packages.fluentbit.io and apt.fluentbit.io should remain the same, in fact, we are aiming for the release bucket to keep the same exact layout/structure without changes.

Enabling s3 has many benefits for us, including CDN, replication, backup, simplify the releases, etc.

@patrick-stephens
Copy link
Contributor

Current plan therefore is to use a parallel workflow where we maintain the current process but also start producing the S3 bucket for release as well to evaluate. We also need to ensure build times are kept low, possibly by using a self-hosted runner for it.

@niedbalski
Copy link
Collaborator Author

@patrick-stephens

Here is my take for testing on top of staging:

  • Should be able to run locally with act + kind.
  1. Images
  • Docker smoke test on 2 archs (amd64/arm64) with multiarch images.

    • Docker run with a blessed configuration (this can be simply a golden config [service] listen 2020 or alike)
      • Check that the container starts
      • Check that port 2020 is reachable.
  • Kubernetes smoke test (kind or k3s)

    • A daemonset or a deployment of fluentbit (using helm chart)
      • Check that the pods reach ready status
  1. Packages.
  • Run through a set of auto-tests https://github.com/ruilapa/fluentbit-packages-test
    • Newly/fresh install (all supported distributions)
      • Package gets installed
      • Service gets started (with a default golden config)
    • Upgraded install from N-1 (all supported distributions)
      • Package gets upgraded
      • Service keeps running
    • Removal/Install
      • Package gets removed
      • Package gets installed
      • Service keeps running

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Dec 2, 2021

Agreed, I think for golden config I'll add a dummy input & stdout output to exercise the pipeline a bit. This is what I've done previously and then you can easily check for the expected output too. Eventually we can evolve this to do more if we want.

In fact, the default config might be fine - it's a shame that the server is not defaulted to running (I know people get tripped up on the helm chart healthchecks by this). It does CPU and stdout already.

@patrick-stephens
Copy link
Contributor

Staging build is almost there now, just resolving some GPG signing issues but should present an S3 bucket with all the repos set up correctly. Container images built, scanned (Trivy + Dockle) and signed (Cosign) before staging to ghcr.io.

Container testing as per the above is in place - verify each architecture image locally then use the Helm chart to verify in K8S deployment (whatever is the default in KIND when run). Package verification is in progress using kitchen-dokken: OS-based images for each target have the package installed and then we verify the service is running.

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Dec 9, 2021

We will also look to trigger downstream integration and soak tests in staging to verify more things. @niedbalski
I'll add workflow_call and workflow_dispatch to https://github.com/calyptia/fluent-bit-ci/blob/main/.github/workflows/main-gcp.yaml
We then need to set up the soak test for some level of verification automatically but also manual approval for release.

We should get in the suggestions here: #4389

@niedbalski
Copy link
Collaborator Author

In regards to integration testing:

  1. The staging build workflow will kick a external run on [0] using the new import semantics.
  2. The workflow [0] will kick a new set of integration tests based on the staging images provided
    via a workflow parameter.

[0] https://github.com/calyptia/fluent-bit-ci/blob/main/.github/workflows/main-gcp.yaml#L7

@niedbalski
Copy link
Collaborator Author

niedbalski commented Dec 13, 2021

@patrick-stephens As a reference for the build/release to staging workflows.

For 4, that is covered by the private mirror due to the security concerns.

@niedbalski niedbalski linked a pull request Dec 13, 2021 that will close this issue
niedbalski pushed a commit that referenced this issue Dec 13, 2021
* Addresses #3753

New workflows added to automate the build and test of releases using the new staging environment.
No changes made to current process to ensure we can keep using it.

Build & test of packaging
Packages built to staging in S3 bucket: https://fluentbit-staging.s3.amazonaws.com
We then verify the packages using kitchen-dokken to spin up OS images as containers, install the relevant RPM/Deb and check the service is properly running then. We are testing that the packaging process is correct.

Containers build to Github Container Registry, gchr.io, using multi-arch manifests.
Container tests then verify each architecture runs locally as well as a simple Helm deployment on KIND.

All package and container build definitions brought into the repo from external sources - containers were in this repo and packages were not so that is now identical plus having them together makes it a lot easier to manage and use.

Security
Trivy and Dockle scanning added - ignores current failures so these should be reviewed and addressed as needed.
Hadolint and Shellcheck really should be used too but this can be a separate PR.

Cosigning of container images if a key is provided, and using the experimental keyless option too.
GPG signing of binary packages as well as normal.

Additional work
Initial promotion from staging to release provided using a new release environment for approval - this needs creating.
Initial multi-arch container image definition and workflow also added.

Follow up PRs to improve testing, build on self-hosted and cover the promotion to release process. Trying to prevent a big bag and reduce review overhead.

Infra updates
Create release and staging environments.
Create the following secrets:

AWS_S3_BUCKET_STAGING
AWS_S3_BUCKET_RELEASE
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
COSIGN_PRIVATE_KEY
COSIGN_PASSWORD - optional if private key does not require
COSIGN_PUBLIC_KEY
FLUENTBITIO_HOST
FLUENTBITIO_USERNAME
FLUENTBITIO_SSHKEY
GPG_PRIVATE_KEY
We can actually start breaking these secrets up into the two environments.

Signed-off-by: Patrick Stephens <[email protected]>
@niedbalski niedbalski reopened this Dec 24, 2021
@patrick-stephens
Copy link
Contributor

Need to add resilience and performance testing: #4390

  • Performance testing
  • Resilience testing

@patrick-stephens
Copy link
Contributor

Need to support package downgrade as well, i.e. official --> staging --> official and stays working. More distributions tested too.

  • All supported distros.
  • Package downgrade test.

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Jan 7, 2022

Working on adding the release promotion job now:

  • signs with release GPG key
  • copies to current packaging server
  • copies to S3 release bucket
  • skopeo sync to DockerHub (and sign hopefully)

#4566

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Jan 13, 2022

Packages (RPM + Deb) looks ok now, working on container release now.

  • container promotion job
  • infra set up (secrets, etc.)

0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
* Addresses fluent#3753

New workflows added to automate the build and test of releases using the new staging environment.
No changes made to current process to ensure we can keep using it.

Build & test of packaging
Packages built to staging in S3 bucket: https://fluentbit-staging.s3.amazonaws.com
We then verify the packages using kitchen-dokken to spin up OS images as containers, install the relevant RPM/Deb and check the service is properly running then. We are testing that the packaging process is correct.

Containers build to Github Container Registry, gchr.io, using multi-arch manifests.
Container tests then verify each architecture runs locally as well as a simple Helm deployment on KIND.

All package and container build definitions brought into the repo from external sources - containers were in this repo and packages were not so that is now identical plus having them together makes it a lot easier to manage and use.

Security
Trivy and Dockle scanning added - ignores current failures so these should be reviewed and addressed as needed.
Hadolint and Shellcheck really should be used too but this can be a separate PR.

Cosigning of container images if a key is provided, and using the experimental keyless option too.
GPG signing of binary packages as well as normal.

Additional work
Initial promotion from staging to release provided using a new release environment for approval - this needs creating.
Initial multi-arch container image definition and workflow also added.

Follow up PRs to improve testing, build on self-hosted and cover the promotion to release process. Trying to prevent a big bag and reduce review overhead.

Infra updates
Create release and staging environments.
Create the following secrets:

AWS_S3_BUCKET_STAGING
AWS_S3_BUCKET_RELEASE
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
COSIGN_PRIVATE_KEY
COSIGN_PASSWORD - optional if private key does not require
COSIGN_PUBLIC_KEY
FLUENTBITIO_HOST
FLUENTBITIO_USERNAME
FLUENTBITIO_SSHKEY
GPG_PRIVATE_KEY
We can actually start breaking these secrets up into the two environments.

Signed-off-by: Patrick Stephens <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@edsiper
Copy link
Member

edsiper commented Aug 16, 2024

is this ok to close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants