Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions on Windows doesn't stop on error #6668

Closed
2 of 11 tasks
justanotheranonymoususer opened this issue Nov 30, 2022 · 9 comments
Closed
2 of 11 tasks

GitHub Actions on Windows doesn't stop on error #6668

justanotheranonymoususer opened this issue Nov 30, 2022 · 9 comments

Comments

@justanotheranonymoususer
Copy link

justanotheranonymoususer commented Nov 30, 2022

Description

Here's a pipeline which demonstrates the bug, it should fail but it keeps on running, leading to unexpected consequences in the rest of the pipeline:

  build-windows-failure-bug:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v3
      - name: Fail test
        run: |
          curl https://does-not-exist.this-will-fail/
          curl https://example.com/

Here's a more complete example:
https://github.com/justanotheranonymoususer/actions-test/blob/5c0935375762cfbcc8f6231aae60c50a6c2b284b/.github/workflows/blank.yml

Execution result:
https://github.com/justanotheranonymoususer/actions-test/actions/runs/3569092631

You can see that on ubuntu-latest, it fails properly, but in windows-latest, it doesn't fail even though it should.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 18.04
  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 10.15
  • macOS 11
  • macOS 12
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

windows-latest

Is it regression?

Not/don't know

Expected behavior

Windows pipeline fails on first error

Actual behavior

Windows pipeline keeps running

Repro steps

See pipeline in the description

@ddobranic
Copy link
Contributor

hello @justanotheranonymoususer, thank you! we will investigate.

@ddobranic ddobranic assigned dsame and ddobranic and unassigned dsame Dec 1, 2022
@ddobranic
Copy link
Contributor

@justanotheranonymoususer you could slightly modify you workflow and get the same results for Ubuntu and Windows.
Here is an example:

build-windows-failure-bug:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v3
      - name: Fail test
        run: |
          curl https://dfgjndrgierngiergnregerg
          curl https://example.com/
        shell: bash

@ddobranic
Copy link
Contributor

@justanotheranonymoususer, i am going to close the issue. if you have any questions or concerns, please feel free to reopen.

@justanotheranonymoususer
Copy link
Author

@ddobranic
Copy link
Contributor

@justanotheranonymoususer it is about default behaviors of the two different shells. for bash, we set 'e' flag which gives exact behavior you are expecting - exit on error.

@justanotheranonymoususer
Copy link
Author

I did a couple of other checks. I tried these shells: bash, sh, powershell, pwsh, cmd.

  • These continue even if there's an error: pwsh, cmd
  • These abort as expected: bash, sh, powershell

Here's what the docs say:

cmd

There doesn't seem to be a way to fully opt into fail-fast behavior other than writing your script to check each error code and respond accordingly. Because we can't actually provide that behavior by default, you need to write this behavior into your script.

pwsh (default on Windows)

Fail-fast behavior when possible. For pwsh and powershell built-in shell, we will prepend $ErrorActionPreference = 'stop' to script contents.

For cmd, it's known and documented that it continues after a failure. But for pwsh, it should abort after the first error, which seems like a bug as it actually continues execution.

I'm not sure what "when possible" means in the PowerShell documentation. I think it should be fixed, and, if not possible, documented properly.

dalito added a commit to dalito/linkml that referenced this issue Jan 4, 2023
dalito added a commit to dalito/linkml that referenced this issue Jan 5, 2023
kichik added a commit to CloudSnorkel/cdk-github-runners that referenced this issue Mar 27, 2023
GitHub Actions don't check if commands are failing in the workflow while using the default shell (PowerShell). Switch to bash so we don't ignore errors.

actions/runner-images#6668
mergify bot pushed a commit to CloudSnorkel/cdk-github-runners that referenced this issue Mar 27, 2023
GitHub Actions don't check if any single commands are failing while using the default shell (PowerShell). Switch to bash so we don't ignore errors.

actions/runner-images#6668

Fixes #253
@Gustl22
Copy link

Gustl22 commented Sep 7, 2023

@ddobranic how I can achieve the same behavior (to abort) with powershell or pwsh as I need some of their commands?

@coffeebe4code
Copy link

coffeebe4code commented Sep 10, 2023

Same issue. Windows passes but linux fails. cargo test fails, but the e2e is successful. set -e should be the default

  build-win:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      - run: |
          cargo build --release
          cargo test
          ./target/release/e2e

  build-linux:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      - run: |
          cargo build --release
          cargo test
          ./target/release/e2e

@beamerblvd
Copy link

Should a new bug be filed, or is it possible to re-open this bug? Per this comment, shell pwsh is not working as documented, which is a bug (either with the workflow or the documentation). Worse, shell: cannot be parameterized with the matrix, so the choices are to use Bash with everything or have a completely different workflow script for Windows vs other OSes, neither of which are ideal choices.

WardBrian added a commit to roualdes/bridgestan that referenced this issue Feb 12, 2024
WardBrian added a commit to roualdes/bridgestan that referenced this issue Feb 12, 2024
WardBrian added a commit to roualdes/bridgestan that referenced this issue Feb 12, 2024
* Manually fail for Windows RLang tests

actions/runner-images#6668

* Quoting changes in Powershell 7.4

actions/runner-images#9115
robin-aws added a commit to aws/aws-cryptographic-material-providers-library that referenced this issue Feb 12, 2024
facebook-github-bot pushed a commit to facebook/rocksdb that referenced this issue Mar 14, 2024
Summary:
Issue #12421 describes a regression in the migration from CircleCI to GitHub Actions in which failing build steps no longer fail Windows CI jobs. In GHA with pwsh (new preferred powershell command), only the last non-builtin command (or something like that) affects the overall success/failure result, and failures in external commands do not exit the script, even with `$ErrorActionPreference = 'Stop'` and `$PSNativeCommandErrorActionPreference = $true`. Switching to `powershell` causes some obscure failure (not seen in CircleCI) about the `-Lo` option to `curl`.

Here we work around this using the only reasonable-but-ugly way known: explicitly check the result after every non-trivial build step. This leaves us highly susceptible to future regressions with unchecked build steps in the future, but a clean solution is not known.

This change also fixes the build errors that were allowed to creep in because of the CI regression. Also decreased the unnecessarily long running time of DBWriteTest.WriteThreadWaitNanosCounter.

For background, this problem explicitly contradicts GitHub's documentation, and GitHub has known about the problem for more than a year, with no evidence of caring or intending to fix. actions/runner-images#6668 Somehow CircleCI doesn't have this problem. And even though cmd.exe and powershell have been perpetuating DOS-isms for decades, they still seem to be a somewhat active "hot mess" when it comes to sensible, consistent, and documented behavior.

Fixes #12421

A history of some things I tried in development is here: main...pdillinger:rocksdb:debug_windows_ci_orig

Pull Request resolved: #12426

Test Plan: CI, including #12434 where I have temporarily enabled other Windows builds on PR with this change

Reviewed By: cbi42

Differential Revision: D54903698

Pulled By: pdillinger

fbshipit-source-id: 116bcbebbbf98f347c7cf7dfdeebeaaed7f76827
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Mar 14, 2024
Summary:
Issue facebook#12421 describes a regression in the migration from CircleCI to GitHub Actions in which failing build steps no longer fail Windows CI jobs. In GHA with pwsh (new preferred powershell command), only the last non-builtin command (or something like that) affects the overall success/failure result, and failures in external commands do not exit the script, even with `$ErrorActionPreference = 'Stop'` and `$PSNativeCommandErrorActionPreference = $true`. Switching to `powershell` causes some obscure failure (not seen in CircleCI) about the `-Lo` option to `curl`.

Here we work around this using the only reasonable-but-ugly way known: explicitly check the result after every non-trivial build step. This leaves us highly susceptible to future regressions with unchecked build steps in the future, but a clean solution is not known.

This change also fixes the build errors that were allowed to creep in because of the CI regression. Also decreased the unnecessarily long running time of DBWriteTest.WriteThreadWaitNanosCounter.

For background, this problem explicitly contradicts GitHub's documentation, and GitHub has known about the problem for more than a year, with no evidence of caring or intending to fix. actions/runner-images#6668 Somehow CircleCI doesn't have this problem. And even though cmd.exe and powershell have been perpetuating DOS-isms for decades, they still seem to be a somewhat active "hot mess" when it comes to sensible, consistent, and documented behavior.

Fixes facebook#12421

A history of some things I tried in development is here: facebook/rocksdb@main...pdillinger:rocksdb:debug_windows_ci_orig

Pull Request resolved: facebook#12426

Test Plan: CI, including facebook#12434 where I have temporarily enabled other Windows builds on PR with this change

Reviewed By: cbi42

Differential Revision: D54903698

Pulled By: pdillinger

fbshipit-source-id: 116bcbebbbf98f347c7cf7dfdeebeaaed7f76827
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Mar 14, 2024
Summary:
Issue facebook#12421 describes a regression in the migration from CircleCI to GitHub Actions in which failing build steps no longer fail Windows CI jobs. In GHA with pwsh (new preferred powershell command), only the last non-builtin command (or something like that) affects the overall success/failure result, and failures in external commands do not exit the script, even with `$ErrorActionPreference = 'Stop'` and `$PSNativeCommandErrorActionPreference = $true`. Switching to `powershell` causes some obscure failure (not seen in CircleCI) about the `-Lo` option to `curl`.

Here we work around this using the only reasonable-but-ugly way known: explicitly check the result after every non-trivial build step. This leaves us highly susceptible to future regressions with unchecked build steps in the future, but a clean solution is not known.

This change also fixes the build errors that were allowed to creep in because of the CI regression. Also decreased the unnecessarily long running time of DBWriteTest.WriteThreadWaitNanosCounter.

For background, this problem explicitly contradicts GitHub's documentation, and GitHub has known about the problem for more than a year, with no evidence of caring or intending to fix. actions/runner-images#6668 Somehow CircleCI doesn't have this problem. And even though cmd.exe and powershell have been perpetuating DOS-isms for decades, they still seem to be a somewhat active "hot mess" when it comes to sensible, consistent, and documented behavior.

Fixes facebook#12421

A history of some things I tried in development is here: facebook/rocksdb@main...pdillinger:rocksdb:debug_windows_ci_orig

Pull Request resolved: facebook#12426

Test Plan: CI, including facebook#12434 where I have temporarily enabled other Windows builds on PR with this change

Reviewed By: cbi42

Differential Revision: D54903698

Pulled By: pdillinger

fbshipit-source-id: 116bcbebbbf98f347c7cf7dfdeebeaaed7f76827
pdillinger added a commit to facebook/rocksdb that referenced this issue Mar 15, 2024
Summary:
Issue #12421 describes a regression in the migration from CircleCI to GitHub Actions in which failing build steps no longer fail Windows CI jobs. In GHA with pwsh (new preferred powershell command), only the last non-builtin command (or something like that) affects the overall success/failure result, and failures in external commands do not exit the script, even with `$ErrorActionPreference = 'Stop'` and `$PSNativeCommandErrorActionPreference = $true`. Switching to `powershell` causes some obscure failure (not seen in CircleCI) about the `-Lo` option to `curl`.

Here we work around this using the only reasonable-but-ugly way known: explicitly check the result after every non-trivial build step. This leaves us highly susceptible to future regressions with unchecked build steps in the future, but a clean solution is not known.

This change also fixes the build errors that were allowed to creep in because of the CI regression. Also decreased the unnecessarily long running time of DBWriteTest.WriteThreadWaitNanosCounter.

For background, this problem explicitly contradicts GitHub's documentation, and GitHub has known about the problem for more than a year, with no evidence of caring or intending to fix. actions/runner-images#6668 Somehow CircleCI doesn't have this problem. And even though cmd.exe and powershell have been perpetuating DOS-isms for decades, they still seem to be a somewhat active "hot mess" when it comes to sensible, consistent, and documented behavior.

Fixes #12421

A history of some things I tried in development is here: main...pdillinger:rocksdb:debug_windows_ci_orig

Pull Request resolved: #12426

Test Plan: CI, including #12434 where I have temporarily enabled other Windows builds on PR with this change

Reviewed By: cbi42

Differential Revision: D54903698

Pulled By: pdillinger

fbshipit-source-id: 116bcbebbbf98f347c7cf7dfdeebeaaed7f76827
pusewicz added a commit to pusewicz/cute_framework that referenced this issue Aug 5, 2024
This is so that the scripts can fail if commands return non-zero.

Cmd happily continues and never returns failures.

Ref actions/runner-images#6668
Tim-Zhang added a commit to quanweiZhou/ttrpc-rust that referenced this issue Oct 9, 2024
It's important for windows to fail correctly.
Further: actions/runner-images#6668

Signed-off-by: Tim Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants