-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve visibility on failures in daily job #6071
Comments
I opened #6746 as an attempt at one minor improvement into the daily job status. |
Report new errors/failures as GitHub IssuesFor this purpose, it could be take advantage of the xUnit files written by If there is any failure or error in tests, it would be present there. An example of a real failing test: <?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="system" tests="3" failures="1">
<!--test suite for system tests-->
<testcase name="system test: postgresql" classname="sql_input." time="37.935391511"></testcase>
<testcase name="system test: mssql" classname="sql_input." time="27.131335712"></testcase>
<testcase name="system test: mysql" classname="sql_input." time="34.332166741">
<failure>one or more errors found in documents stored in metrics-sql.sql-ep data stream: [0] found error.message in event: cannot open connection: testing connection: dial tcp 172.18.0.7:3306: connect: connection refused</failure>
</testcase>
</testsuite>
</testsuites> As these tests are run in the daily job (local Elastic stack), they also are run in a specific version of the Elastic stack. For instance, it could be 7.17.0-SNAPSHOT or 8.14-SNAPSHOT. However, this is not valid for the builds testing with Elastic Serverless. Having that as a basis, I think of two different options to create GitHub issues:
Issues created should have this information available to help the owner team:
How to avoid creating duplicated GitHub IssuesSome checks that can be performed to avoid creating duplicated issues for the same errors/failures:
It could be added some metadata in the issue description following the example of: https://github.com/probot/metadata to help checking whether or not an issue for that test (or tests) was already created. In Kibana, there is a similar approach: https://github.com/elastic/kibana/blob/f82d64043155736b6daf3a5c2286fa14417fc19c/packages/kbn-failed-test-reporter-cli/failed_tests_reporter/issue_metadata.ts#L45
All these checks could be performed in a new step in Buildkite:
Open Questions
cc @elastic/ecosystem |
Yes, it looks like a good approach. We may have failures related to a package that don't generate a xunit file. But rather than handling specifically these failures, I would prefer to enhance elastic-package so it generate xunit files for these failures.
It would be better to create one issue per failing test, this is more actionable. It would be important to avoid duplicate issues.
Yes.
I think we can rely on the information provided in the xunit file, so we'd have titles like "Failing tests: build name - test title in buildkite", so for example: "Failing test: daily 7.x - system test: default (variant: v7.1.0) in couchbase.xdcr", and we can base the duplication checks on these titles. Even if the issue is not exactly the same, this indicates that a given package needs to be reviewed in a given scenario. The "build name" would indicate if the issue happened in 7.x, 8.x/latest or serverless. I would not reference here to specific versions because the same issue can be happening during multiple release cycles. For the same reason I would not use the package version as dimension, a package whose usual tests don't fail could have multiple versions with issues in the daily jobs. When finding duplicates, it would be nice to include the new failing build in the description of the issue. We could keep links to the original failing build and to the N latest failures found.
Yes, the team owner is in principle responsible of closing it. Maybe ecosystem should be pinged too, at least at the beginning to discard issues created by reasons not related to the package. |
Yes, it could be added a comment to the issue or try to update the description of the issue to include latest failures (build links). I don't know if it can be updated the description of an issue with the |
We have a
daily jobdaily job that tests all integrations with the latest snapshot version of the stack. Failures on this job are only reported to a Slack Channel, but this has been failing for some time, so it has been relying on manually checking the health of the job.Define a better visibility strategy for this job, so we can early detect issues there, and notify the affected teams.
Some ideas:
The text was updated successfully, but these errors were encountered: