Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An old value of MSYS with unpaired surrogates is silently replaced for fixtures #1574

Closed
EliahKagan opened this issue Sep 5, 2024 · 0 comments · Fixed by #1580
Closed

An old value of MSYS with unpaired surrogates is silently replaced for fixtures #1574

EliahKagan opened this issue Sep 5, 2024 · 0 comments · Fixed by #1580
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Sep 5, 2024

Current behavior 😯

In gix-testtools, when test fixture scripts are run, the configure_command function customizes their environment. Since 0899c2e (#1444), this includes setting the MSYS variable to make it so that commands in shell scripts such as ln -s will really attempt to create symlinks rather than copies.

If MSYS was already set, the options held in its old value are kept, with the new option appended at the end so it will only suppress an option that conflicts with it, otherwise leaving the effects of preceding options in place.

But this does not happen if the old value of MSYS contained an unpaired surrogate code point. This is because env::var rather than env::var_os is used, so the unwrap_or_default method returns an empty string both when MSYS was absent and when it was present with a value that could not be represented as well-formed UTF-8.

https://github.com/Byron/gitoxide/blob/9bddeeb948397df3d1e391ec48597cc34d10acd0/tests/tools/src/lib.rs#L596-L597

https://github.com/Byron/gitoxide/blob/9bddeeb948397df3d1e391ec48597cc34d10acd0/tests/tools/src/lib.rs#L605

There are at least three scenarios where MSYS may contain an unpaired surrogate:

  • By accident.
  • The path to one's debugger that one specifies with error_start: contains one. This seems unlikely, except as a special case of...
  • Intentionally to test the behavior of some component when MSYS contains an unpaired surrogate.

The impact of this bug is therefore small. However, fixing it would help clarify the intended semantics of how gix-testtools treats the MSYS environment variable when running test fixture scripts. (This may potentially be relevant to some changes made for #1578.)

Expected behavior 🤔

Due to the nature of gix-testtools as a testing facility, both as part of gitoxide's test suite and when the published crate is used, I think any of these behaviors are fine and should be considered to fix the bug:

  1. Always panic when the previous value of MSYS contained an unpaired surrogate, whether or not this is documented.

  2. Always accept and append to the previous value of MSYS when it contains an unpaired surrogate, such as by using env::var_os and calling push (rather than env::var and calling push_str), whether or not this is documented.

    If this is done, then unwrap_or_default could still be used, because it would only be falling back to the default empty value when the environment variable was absent.

    -    let mut msys_for_git_bash_on_windows = std::env::var("MSYS").unwrap_or_default();
    -    msys_for_git_bash_on_windows.push_str(" winsymlinks:nativestrict");
    +    let mut msys_for_git_bash_on_windows = std::env::var_os("MSYS").unwrap_or_default();
    +    msys_for_git_bash_on_windows.push(" winsymlinks:nativestrict");
  3. Most other behaviors, including the current behavior, if it is documented.

While any of those would, in my opinion, satisfy users' expectations and eliminate the bug as such, I think the best behavior would be to do (1) or (2). Specifically:

  • If appending a space followed by nativesymlinks to a preexisting value of MSYS that ends in a high surrogate is effective at causing symlinks to be created, then (2) is preferred.
  • If that does not work, then (1) is preferred.

I plan to investigate this and open a PR that makes one of those changes. (This investigation has been slightly slowed by the problem described in rust-lang/libz-sys#215, but I should be able to complete it.)

Update: I have opened #1580, which changes it to use OsString instead of String and to append with OsString::push. As noted there, I verified that a value of MSYS produced this way does still have the desired effect on symlink creation.

Git behavior

Not directly applicable.

Steps to reproduce 🕹

The nice way to reproduce this would be to make a fixture that displays the value, or to add a tracing statement, or to use a debugger.

Instead, I just added a panic! to print out whatever MSYS value will be used:

diff --git a/tests/tools/src/lib.rs b/tests/tools/src/lib.rs
index b3c0102ea..127e7cab1 100644
--- a/tests/tools/src/lib.rs
+++ b/tests/tools/src/lib.rs
@@ -595,6 +595,7 @@ fn configure_command<'a>(
 ) -> &'a mut std::process::Command {
     let mut msys_for_git_bash_on_windows = std::env::var("MSYS").unwrap_or_default();
     msys_for_git_bash_on_windows.push_str(" winsymlinks:nativestrict");
+    panic!("DEBUG: {msys_for_git_bash_on_windows:?}"); // FIXME: Remove after debugging.
     cmd.args(args)
         .stdout(std::process::Stdio::piped())
         .stderr(std::process::Stdio::piped())

Then I ran a test that, while conceptually unrelated to this issue, exercises that code, ensuring the panic! is hit. The following output was warnings induced by making that change, about unused variables and unreachable code, removed. The full output can be seen in this gist.

First, a control, with well-formed Unicode:

ek@Glub MINGW64 ~/source/repos/gitoxide/tests/tools (main)
$ MSYS='error_start:nonexistent' cargo nextest run configure_command_clears_external_config

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.39s
------------
 Nextest run ID e36b5636-bba9-47d5-b9d5-0ae20e5c0f30 with nextest profile: default
    Starting 1 test across 3 binaries (6 tests skipped)
        FAIL [   0.015s] gix-testtools tests::configure_command_clears_external_config

--- STDOUT:              gix-testtools tests::configure_command_clears_external_config ---

running 1 test
test tests::configure_command_clears_external_config ... FAILED

failures:

failures:
    tests::configure_command_clears_external_config

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.01s


--- STDERR:              gix-testtools tests::configure_command_clears_external_config ---
thread 'tests::configure_command_clears_external_config' panicked at tests\tools\src\lib.rs:598:5:
DEBUG: "error_start:nonexistent winsymlinks:nativestrict"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure
------------
     Summary [   0.017s] 1 test run: 0 passed, 1 failed, 6 skipped
        FAIL [   0.015s] gix-testtools tests::configure_command_clears_external_config
error: test run failed

A now, when the existing value contains an unpaired high surrogate:

ek@Glub MINGW64 ~/source/repos/gitoxide/tests/tools (main)
$ MSYS=$'error_start:\uD800z' cargo nextest run configure_command_clears_external_config

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.39s
------------
 Nextest run ID bc896e5b-fabd-45b5-8823-4889319cf522 with nextest profile: default
    Starting 1 test across 3 binaries (6 tests skipped)
        FAIL [   0.024s] gix-testtools tests::configure_command_clears_external_config

--- STDOUT:              gix-testtools tests::configure_command_clears_external_config ---

running 1 test
test tests::configure_command_clears_external_config ... FAILED

failures:

failures:
    tests::configure_command_clears_external_config

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.02s


--- STDERR:              gix-testtools tests::configure_command_clears_external_config ---
thread 'tests::configure_command_clears_external_config' panicked at tests\tools\src\lib.rs:598:5:
DEBUG: " winsymlinks:nativestrict"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure
------------
     Summary [   0.026s] 1 test run: 0 passed, 1 failed, 6 skipped
        FAIL [   0.024s] gix-testtools tests::configure_command_clears_external_config
error: test run failed

Therefore, normally it is able to keep content from the old value:

DEBUG: "error_start:nonexistent winsymlinks:nativestrict"

But when it cannot be converted to UTF-8, its silently discards it:

DEBUG: " winsymlinks:nativestrict"

(The reason I added the z after the unpaired surrogate was to show a situation where it is not at the very end, which is the more common situation. The problem is not limited to when it is at the very end; rather, whether appending ought to be done, rather than panicking, should, I think, be determined by whether program built against the MSYS DLL are able to heed the appended winsymlinks option even when it the space that delimits it follows an unpaired high surrogate.)

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Sep 6, 2024
The value of an environment variable as obtained by the facilities
in `std::env` is not always well-formed Unicode. Specifically, on
Windows the values of environment variables, like paths, are
natively UTF-16LE strings except that unpaired surrogate code
points can also occur. An `&OsStr` on Windows may accordingly not
quite be UTF-8.

When the `MSYS` variable is absent, we treat this the same as when
it is present but empty. However, as described in GitoxideLabs#1574, an `MSYS`
variable that is present but whose value contains an unpaired
surrogate would also be replaced entirely, rather than appending to
its old value.

This changes that, to instead append, retaining whatever was there
even if it was ill-formed Unicode.

An alternative change could be to panic when the old value is
ill-formed Unicode. This commit allows and appends to the old
value, rather than panicking or keeping and documenting the
previous behavior of discarding the old value, because the appended
sequence ` winsymlinks:nativestrict` is effective at causing
fixture scripts to attempt to create actual symlinks even if
the preceding code point is an unpaired Unicode high surrogate.
@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed labels Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants