-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAULT-33074: add github
sub-command to pipeline
#29403
Conversation
CI Results: |
Build Results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very useful tool. The implementation looks good to me but I recommend having another reviewer assess it as well, given that I'm not highly familiar with Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very excited about this feature!
Investigating test workflow failures is common task that engineers on the sustaining rotation perform. This task often requires quite a bit of manual labor by manually inspecting all failed/cancelled workflows in the Github UI on per repo/branch/workflow basis and performing root cause analysis. As we work to improve our pipeline discoverability this PR adds a new `github` sub-command to the `pipeline` utility that allows querying for such workflows and returning either machine readable or human readable summaries in a single place. Eventually we plan to automate sending a summary of this data to an OTEL collector automatically but for now sustaining engineers can utilize it to query for workflows with lots of various criteria. A common pattern for investigating build/enos test failure workflows would be: ```shell export GITHUB_TOKEN="YOUR_TOKEN" go run -race ./tools/pipeline/... github list-workflow-runs -o hashicorp -r vault -d '2025-01-13..2025-01-23' --branch main --status failure build ``` This will list `build` workflow runs in `hashicorp/vault` repo for the `main` branch with the `status` or `conclusion` of `failure` within the date range of `2025-01-13..2025-01-23`. A sustaining engineer will likely do this for both `vault` and `vault-enterprise` repositories along with `enos-release-testing-oss` and `enos-release-testing-ent` workflows in addition to `build` in order to get a full picture of the last weeks failures. You can also use this utility to summarize workflows based on other statuses, branches, HEAD SHA's, event triggers, github actors, etc. For a full list of filter arguments you can pass `-h` to the sub-command. > [!CAUTION] > Be careful not to run this without setting strict filter arguments. > Failing to do so could result in trying to summarize way too many > workflows resulting in your API token being disabled for an hour. Signed-off-by: Ryan Cragun <[email protected]>
Signed-off-by: Ryan Cragun <[email protected]>
Signed-off-by: Ryan Cragun <[email protected]>
05d20f5
e4e2c41
to
05d20f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note Even though this will only likely execute from the |
Description
Investigating test workflow failures is common task that engineers on the
sustaining rotation perform. This task often requires quite a bit of
manual labor by inspecting all failed/cancelled workflows in the Github UI
on a per repo/branch/workflow basis and performing root cause analysis.
As we work to improve our pipeline discoverability and observability this PR adds a new
github
sub-command to the
pipeline
utility that allows querying for such workflowsand returning either machine readable or human readable summaries in a single
place. Eventually we plan to automate sending a summary of this data to
an OTEL collector automatically, but for now sustaining engineers can
utilize it to query for workflows with lots of various criteria.
A common pattern for investigating build/enos test failure workflows would be:
This will list
build
workflow runs in thehashicorp/vault
repo for themain
branch with thestatus
orconclusion
offailure
within the daterange of
2025-01-13..2025-01-23
.A sustaining engineer will likely do this for both
vault
andvault-enterprise
repositories along withenos-release-testing-oss
andenos-release-testing-ent
workflows in addition tobuild
in order toget a full picture of the last weeks failures.
You can also use this utility to summarize workflows based on other
workflow statuses, a branch name, HEAD SHA, event trigger, github actor, etc. For
a full list of filter arguments you can pass
-h
to the sub-command.Caution
Be careful not to run this without setting strict filter arguments.
Failing to do so could result in trying to summarize way too many
workflows resulting in your API token being disabled for an hour.
TODO only if you're a HashiCorp employee
backport/
label that matches the desired release branch. Note that in the CE repo, the latest release branch will look likebackport/x.x.x
, but older release branches will bebackport/ent/x.x.x+ent
.of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
in the PR description, commit message, or branch name.
description. Also, make sure the changelog is in this PR, not in your ENT PR.