Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectively skip and fix tests after PR#212 #235

Merged
merged 9 commits into from
Feb 7, 2024
Merged

Selectively skip and fix tests after PR#212 #235

merged 9 commits into from
Feb 7, 2024

Conversation

dliappis
Copy link
Contributor

@dliappis dliappis commented Feb 2, 2024

PR #212 made some changes on main that require the presence of MANIFEST_URL. However, the pipeline is still configured to run the same script when a PR (from an upstream branch) is raised and it currently fails.

This commit skips tests when triggered from another pipeline (unified release), and allows running tests when triggered by PR creation by picking up a suitable MANIFEST_URL.

PR #212 made some changes on `main` that require the presence of
`MANIFEST_URL`. However, the pipeline is still configured to run the
same script when a PR (from an upstream branch) is raised and it
currently fails.

This commit skips tests when triggered from another pipeline (unified
release), and allows running tests when triggered by PR creation by
picking up a suitable `MANIFEST_URL`.
@dliappis dliappis requested a review from a team as a code owner February 2, 2024 17:20
@dliappis
Copy link
Contributor Author

dliappis commented Feb 2, 2024

This is ready for review. Because of the multiple ways that this pipeline may run -- via unified release trigger, via branch commits or pr commits -- we need to support a different way of running depending on the occasion:

  1. If we open the PR it (e.g. the latest commit here: https://buildkite.com/elastic/elastic-stack-installers/builds/3720) it looks like:

    image

  2. If the pipeline gets triggered by the unified release it should run as before (and allow to go through the DRA publish).

  3. Special handling is done for staging builds:

    • Unified release may trigger the pipeline but only for non main branches and specifically if they target [0-9]+.[0-9]+ matching branches. Tests are skipped.
    • PRs are only allowed to run the steps (including tests) if they target non-main branch (the reason is that there is no https://staging.elastic.co/latest/main.json, so we can't deduce a valid MANIFEST_URL)
  4. We've removed runs based on push events -- e.g. cutting a new branch from main -- (which used to be allowed before); this is because it complicates unnecessarily the conditionals. This use case (and more) will be covered in a follow up PR, through a scheduled job that runs at regular intervals.

Example run: https://buildkite.com/elastic/elastic-stack-installers/builds/3720 using this PR

@dliappis dliappis self-assigned this Feb 2, 2024
@dliappis dliappis added the bug Something isn't working label Feb 2, 2024
@dliappis
Copy link
Contributor Author

dliappis commented Feb 5, 2024

@strawgate the tests seem to be failing recently with unknown flag: --delayenroll, see https://buildkite.com/elastic/elastic-stack-installers/builds/3720#018d785e-7668-4794-8a0f-2f39dbe52b5e/3827-4851

any thoughts on this?

@strawgate
Copy link
Contributor

@strawgate the tests seem to be failing recently with unknown flag: --delayenroll, see https://buildkite.com/elastic/elastic-stack-installers/builds/3720#018d785e-7668-4794-8a0f-2f39dbe52b5e/3827-4851

any thoughts on this?

That's just the debug output for a negative test case. The failing test case is:


[-] Elastic Agent MSI Installer.Basic Tests for Standalone mode.Blocks downgrades and upgrades in Standalone mode 44.52s (44.52s\|4ms)
--
  | [0] Expected $true, but got $false.
  | at Has-AgentStandaloneLog \| Should -BeTrue, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:28
  | at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:28
  | at Assert-AgentHealthy, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:119
  | at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:177
  | [1] Expected $false, because The agent should have been cleaned up already, but got $true.
  | at Is-AgentBinaryPresent \| Should -BeFalse -Because "The agent should have been cleaned up already", C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:72
  | at Check-AgentRemnants, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:72
  | at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:109

Looking into this

@strawgate
Copy link
Contributor

Pushed a commit, i think it was just a race condition between the agent starting up and us checking its log file

@strawgate
Copy link
Contributor

Build failed with:


Starting a Gradle Daemon, 1 incompatible Daemon could not be reused, use --status for details
--
  |  
  | FAILURE: Build failed with an exception.
  |  
  | * What went wrong:
  | The specified project directory '/release/project-configs/fix-pr-tests' does not exist.
  |  
  | * Try:
  | > Run with --stacktrace option to get the stack trace.
  | > Run with --info or --debug option to get more log output.
  | > Run with --scan to get full insights.
  |  
  | * Get more help at https://help.gradle.org
  |  
  | BUILD FAILED in 2s
  | 🚨 Error: The command exited with status 1

re-running build now

@dliappis
Copy link
Contributor Author

dliappis commented Feb 6, 2024

Build failed with:


Starting a Gradle Daemon, 1 incompatible Daemon could not be reused, use --status for details
--
  |  
  | FAILURE: Build failed with an exception.
  |  
  | * What went wrong:
  | The specified project directory '/release/project-configs/fix-pr-tests' does not exist.
  |  
  | * Try:
  | > Run with --stacktrace option to get the stack trace.
  | > Run with --info or --debug option to get more log output.
  | > Run with --scan to get full insights.
  |  
  | * Get more help at https://help.gradle.org
  |  
  | BUILD FAILED in 2s
  | 🚨 Error: The command exited with status 1

re-running build now

This is an unrelated failure in the DRA publish step: we allow running the publish step in dry run mode from within PRs, but it seems to me that we shouldn't: the unified release expected certain hard coded directories (e.g. 8.13, main etc.) so maybe the best approach would be to skip the DRA step altogether when running from a PR. @DaveSys911 what do you think?

@dliappis
Copy link
Contributor Author

dliappis commented Feb 6, 2024

@strawgate FYI we are still having failures related to tests after your latest commits: https://buildkite.com/elastic/elastic-stack-installers/builds/3747

@strawgate
Copy link
Contributor

@strawgate FYI we are still having failures related to tests after your latest commits: https://buildkite.com/elastic/elastic-stack-installers/builds/3747

Yep will look today, my last commit was to add some additional debug info

@strawgate
Copy link
Contributor

It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky

@dliappis
Copy link
Contributor Author

dliappis commented Feb 6, 2024

@strawgate your latest commit seems to have fixed the tests: https://buildkite.com/elastic/elastic-stack-installers/builds/3750#018d7eb8-cb49-4989-ad1b-f5314ec4f0d6
The DRA publish step is failing because we are running from a PR (and the branch in the PR is not main or 7.x or 8.x). I pushed a change to skip this test, if running from within the context of a PR. Hopefully this will lead to 100% successful PR checks.

It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky

Sounds like a great idea!

@dliappis
Copy link
Contributor Author

dliappis commented Feb 6, 2024

It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky

We got another fresh failure in https://buildkite.com/elastic/elastic-stack-installers/builds/3751#018d7ee0-3fea-4c2a-abfd-0d6f6f6969e3

@strawgate
Copy link
Contributor

fingers crossed on this one -- if this last change doesn't fix it i'll disable the test case that triggers the race condition and fix it in a separate PR

@strawgate
Copy link
Contributor

Tests passed with latest change, running another retest to check for flaky tests

@dliappis
Copy link
Contributor Author

dliappis commented Feb 7, 2024

@strawgate I think we are good here. I just triggered one more build to check for flakiness: https://buildkite.com/elastic/elastic-stack-installers/builds/3794

could you provide a final review so that we can merge?

Also @amitkanfer could please take a look again?

@dliappis
Copy link
Contributor Author

dliappis commented Feb 7, 2024

buildkite test this

@dliappis dliappis merged commit 1518434 into main Feb 7, 2024
3 checks passed
@dliappis dliappis deleted the fix-pr-tests branch February 7, 2024 14:37
@strawgate
Copy link
Contributor

could you provide a final review so that we can merge?

done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants