-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] retry if environmental issues #22662
Conversation
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪Steps errors
Expand to view the steps failures
|
jenkins run the tests please |
jenkins run the tests please |
Jenkinsfile
Outdated
if(fileExists('environmental-issue')) { | ||
sleep 10 | ||
runCommand(args) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is not deleted explicitly, it is true that runCommand deletes the folder but it is weird/hiden. Also, there is no stop condition, what happens if a tool is no longer available for installation? I think that will enter on an infinite loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Actually the file is created in the top level agent, therefore this approach might cause issues for some other stages. It seems this approach might not be needed with files but with global variables. Good catch!
- No infinite loops at all, but two runs since it calls the
runCommand
rather than itself.
Co-authored-by: cachedout <[email protected]>
jenkins run the tests please |
1 similar comment
jenkins run the tests please |
…dows-7 * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...
…dows-7 * upstream/master: (332 commits) Use ECS v1.8.0 (elastic#24086) Add support for postgresql csv logs (elastic#23334) [Heartbeat] Refactor config system (elastic#23467) [CI] install docker-compose with retry (elastic#24069) Add nodes to filebeat-kubernetes.yaml ClusterRole - fixes elastic#24051 (elastic#24052) updating manifest files for filebeat threatintel module (elastic#24074) Add Zeek Signatures (elastic#23772) Update Beats to ECS 1.8.0 (elastic#23465) Support running Docker logging plugin on ARM64 (elastic#24034) Fix ec2 metricset fields.yml and add integration test (elastic#23726) Only build targz and zip versions of Beats if PACKAGES is set in agent (elastic#24060) [Filebeat] Add field definitions for known Netflow/IPFIX vendor fields (elastic#23773) [Elastic Agent] Enroll with Fleet Server (elastic#23865) [Filebeat] Convert logstash logEvent.action objects to strings (elastic#23944) [Ingest Management] Fix reloading of log level for services (elastic#24055) Add Agent standalone k8s manifest (elastic#23679) [Metricbeat][Kubernetes] Extend state_node with more conditions (elastic#23905) [CI] googleStorageUploadExt step (elastic#24048) Check fields are documented for aws metricsets (elastic#23887) Update go-concert to 0.1.0 (elastic#23770) ...
…dows-7 * upstream/master: Remove OSS reference for kibana and elasticsearch (elastic#24164) Skip flaky TestActions on MacOSx (elastic#23966) [Filebeat][AWS] Fix vpcflow pipeline exception: Cannot invoke "Object.getClass()" because "receiver" is null (elastic#24167) [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155) [PACKAGING] Push docker images with the architecture in the version (elastic#24121) [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938) indicator type url is in upper case (elastic#24152) [Filebeat] Document netflow internal_networks and set default (elastic#24110) [Filebeat] Adding fixes to the TI module (elastic#24133) [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347) [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128) Set Elastic licence type for APM server Beats update job (elastic#24122) Add logrotation section on Running Filebeat on k8s (elastic#24120) [CI] Run if manual UI (elastic#24116) [CI] enable x-pack/heartbeat in the CI (elastic#23873) chore: comment out the E2E (elastic#24109) chore: add-backport-next (elastic#24098) Adjust the position of the architecture name in Dockerlogbeat tarball (elastic#24095) Update dependencies for M1 support in System (elastic#24019)
…dows-7 * upstream/master: [CI] Add ARM packaging (elastic#24041) Add example input autodsicover config (elastic#24157) Empty configuration options generate `<no value>` string for azure-eventhub input (elastic#24156)
/test |
* environmental issues. Therefore it passes the arguments to the runCommand. | ||
* For further details regarding the arguments please refers to the runCommand method. | ||
*/ | ||
def target(Map args = [:]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do think about a more descriptive name? I acknowledge references should be updated.
def target(Map args = [:]) { | |
def safeRunCommand(Map args = [:]) { |
Alternatives?
- runCommandWithEnviromentalIssues (longer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not intended to change the name at first, though it might be worth to add more clarity.
What do you think runCommandWithRetry
?
This pull request is now in conflicts. Could you fix it? 🙏
|
What does this PR do?
Retry a stage with a new node if there is an environmental issue
When does an environmental issue happen?
Why is it important?
Reduce the flakiness for environmental issues
We can potentially add more analysis to even retry for other use cases
Issues
Caused by #22661 and
potentially fixed with #22626.(It did not work as expected)windows-2012 build recently failed even when deleting the workspace, that smells some issues with the provisioner:
Figures
In the last 30 days, the master branch has failed about 13 times with
probably
some environmental issues:Some build examples with this particular environmental issue