Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-33074: add github sub-command to pipeline #29403

Merged
merged 3 commits into from
Jan 31, 2025
Merged

VAULT-33074: add github sub-command to pipeline #29403

merged 3 commits into from
Jan 31, 2025

Conversation

ryancragun
Copy link
Collaborator

@ryancragun ryancragun commented Jan 23, 2025

Description

Investigating test workflow failures is common task that engineers on the
sustaining rotation perform. This task often requires quite a bit of
manual labor by inspecting all failed/cancelled workflows in the Github UI
on a per repo/branch/workflow basis and performing root cause analysis.

As we work to improve our pipeline discoverability and observability this PR adds a new github
sub-command to the pipeline utility that allows querying for such workflows
and returning either machine readable or human readable summaries in a single
place. Eventually we plan to automate sending a summary of this data to
an OTEL collector automatically, but for now sustaining engineers can
utilize it to query for workflows with lots of various criteria.

A common pattern for investigating build/enos test failure workflows would be:

export GITHUB_TOKEN="YOUR_TOKEN"
cd tools/pipeline
go run ./... github list-workflow-runs -o hashicorp -r vault -d '2025-01-13..2025-01-23' --branch main --status failure build

This will list build workflow runs in the hashicorp/vault repo for the
main branch with the status or conclusion of failure within the date
range of 2025-01-13..2025-01-23.

A sustaining engineer will likely do this for both vault and
vault-enterprise repositories along with enos-release-testing-oss and
enos-release-testing-ent workflows in addition to build in order to
get a full picture of the last weeks failures.

You can also use this utility to summarize workflows based on other
workflow statuses, a branch name, HEAD SHA, event trigger, github actor, etc. For
a full list of filter arguments you can pass -h to the sub-command.

Caution

Be careful not to run this without setting strict filter arguments.
Failing to do so could result in trying to summarize way too many
workflows resulting in your API token being disabled for an hour.

TODO only if you're a HashiCorp employee

  • Backport Labels: If this fix needs to be backported, use the appropriate backport/ label that matches the desired release branch. Note that in the CE repo, the latest release branch will look like backport/x.x.x, but older release branches will be backport/ent/x.x.x+ent.
    • LTS: If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
  • ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
    of a public function, even if that change is in a CE file, double check that
    applying the patch for this PR to the ENT repo and running tests doesn't
    break any tests. Sometimes ENT only tests rely on public functions in CE
    files.
  • Jira: If this change has an associated Jira, it's referenced either
    in the PR description, commit message, or branch name.
  • RFC: If this change has an associated RFC, please link it in the description.
  • ENT PR: If this change has an associated ENT PR, please link it in the
    description. Also, make sure the changelog is in this PR, not in your ENT PR.

Sorry, something went wrong.

@ryancragun ryancragun added pr/no-changelog pr/no-milestone backport/ent/1.16.x+ent Changes are backported to 1.16.x+ent backport/1.18.x backport/ent/1.17.x+ent Changes are backported to 1.17.x+ent labels Jan 23, 2025
@ryancragun ryancragun requested a review from a team as a code owner January 23, 2025 23:53
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jan 23, 2025
@ryancragun
Copy link
Collaborator Author

An example of the output:

Screenshot 2025-01-23 at 4 58 33 PM

Copy link

CI Results:
All Go tests succeeded! ✅

Copy link

Build Results:
All builds succeeded! ✅

tvo0813
tvo0813 previously approved these changes Jan 24, 2025
Copy link
Collaborator

@tvo0813 tvo0813 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very useful tool. The implementation looks good to me but I recommend having another reviewer assess it as well, given that I'm not highly familiar with Go.

rebwill
rebwill previously approved these changes Jan 24, 2025
Copy link
Collaborator

@rebwill rebwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about this feature!

charlesn-hc
charlesn-hc previously approved these changes Jan 24, 2025
Investigating test workflow failures is common task that engineers on the
sustaining rotation perform. This task often requires quite a bit of
manual labor by manually inspecting all failed/cancelled workflows in
the Github UI on per repo/branch/workflow basis and performing root cause
analysis.

As we work to improve our pipeline discoverability this PR adds a new `github`
sub-command to the `pipeline` utility that allows querying for such workflows
and returning either machine readable or human readable summaries in a single
place. Eventually we plan to automate sending a summary of this data to
an OTEL collector automatically but for now sustaining engineers can
utilize it to query for workflows with lots of various criteria.

A common pattern for investigating build/enos test failure workflows would be:
```shell
export GITHUB_TOKEN="YOUR_TOKEN"
go run -race ./tools/pipeline/... github list-workflow-runs -o hashicorp -r vault -d '2025-01-13..2025-01-23' --branch main --status failure build
```

This will list `build` workflow runs in `hashicorp/vault` repo for the
`main` branch with the `status` or `conclusion` of `failure` within the date
range of `2025-01-13..2025-01-23`.

A sustaining engineer will likely do this for both `vault` and
`vault-enterprise` repositories along with `enos-release-testing-oss` and
`enos-release-testing-ent` workflows in addition to `build` in order to
get a full picture of the last weeks failures.

You can also use this utility to summarize workflows based on other
statuses, branches, HEAD SHA's, event triggers, github actors, etc. For
a full list of filter arguments you can pass `-h` to the sub-command.

> [!CAUTION]
> Be careful not to run this without setting strict filter arguments.
> Failing to do so could result in trying to summarize way too many
> workflows resulting in your API token being disabled for an hour.

Signed-off-by: Ryan Cragun <[email protected]>
Signed-off-by: Ryan Cragun <[email protected]>
Copy link

@charlesn-hc charlesn-hc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff 🚀

Copy link
Collaborator

@tvo0813 tvo0813 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryancragun
Copy link
Collaborator Author

Note

Even though this will only likely execute from the main branch I'm still backporting it to all active branches so that our pipeline utility stays in sync.

@ryancragun ryancragun merged commit cda9ad3 into main Jan 31, 2025
91 of 92 checks passed
@ryancragun ryancragun deleted the VAULT-33074 branch January 31, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.16.x+ent Changes are backported to 1.16.x+ent backport/ent/1.17.x+ent Changes are backported to 1.17.x+ent backport/1.18.x hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed pr/no-changelog pr/no-milestone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants