Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gix_path::env:shell() more robust and use it in gix-command #1862

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EliahKagan
Copy link
Member

This makes git_path::env::shell() more likely to work correctly on Windows, then attempts to use it in gix_command::Prepare when running a command with a shell due to use_shell being set to true. Full details are in the commit messages, though the changes are also summarized here.

I think this is not ready to merge yet, due to failures that happen when running the tests locally from PowerShell, but that I cannot produce otherwise. I do not know what causes these failures, and I expect I may have difficulty getting this to a point where it would be ready to merge without input. Since the failures happen in a test that was fairly recently added, I figured you might just know what is going on.

I think shell() can and should be used for gix-command. But if not then the reason will be relevant to other potential uses of shell(). In that case, assuming they leave enough good uses for shell() that it is kept, I hope to discover and articulate those reasons in the documentation for shell(). But I think most likely there is a bug or wrong test expectation somewhere that accounts for the local test failure, and that either the current implementation of shell() with the modifications included here, or something much like it, is suitable for broad use.

Making gix_path::env::shell() more robust

This makes gix_path::env::shell() more likely to work correctly on Windows by:

  • Checking that the path to a sh.exe associated with Git for Windows actually exists (and is either a regular file or a symbolic link that, when fully dereferenced, gives a file), because if that is not the case, then the fallback of just looking up sh.exe using a PATH search is more likely to succeed.

    This change is needed to use shell() more widely without breaking things on systems where sh.exe is available and usable but cannot be found in the usual location where Git for Windows supplies it.

  • Using / rather than \ directory separators when appending usr, bin, and sh.exe components to the Git for Windows installation directory path obtained via git --exec-path and going up three components. This causes the path through sh.exe is run to use all / separators except in the unusual situation that GIT_EXEC_PATH has been set to a path with \ separator. (This is unusual because people don't usually set GIT_EXEC_PATH. It would probably be best, when setting it on Windows, to use / separators, but I do not actually know how often people do that.)

    I believe this is only a minor improvement, in view of this path not being automatically passed through to the shell, as described in #1840 (reply in thread). But this change also makes tests in gix-command and gix-credentials pass that would otherwise require more extensive modification related to \ escaping, when modifying gix-command to use shell().

Using gix_path::env::shell() in gix-command

Before the changes in this PR, gix-command uses the relative path sh.exe on Windows. This works a significant fraction of the time already. In particular, the problems with the relative path bash.exe that make it frequently unusable to find a non-WSL bash on Windows, described in #1359 (comment), do not apply to sh.exe, since there is no Microsoft-provided sh.exe related to WSL. However:

  • Such an sh.exe is not always the best choice when present.
  • There may be no sh.exe that can be found in PATH. For example, if the only sh.exe is supplied by Git for Windows, and neither of the Git for Windows directories that contain an sh.exe binary are in PATH, then before the changes in this PR, a shell will not be found unless shell_program() is called (every time a command with use_shell is run).

Therefore, this uses shell(), with the improvements described above, in gix-command.


As alluded to above, I think this is not ready yet. All tests pass on CI, as well a when run locally from Git Bash. But when running them locally in PowerShell, gix-merge::merge blob::platform::merge::with_external fails. The failure does not require GIX_TEST_IGNORE_ARCHIVES to be set, and this PR does not currently modify any test fixture scripts.

Windows failures that occur when tests are run from PowerShell but not from Git Bash have usually been due to #1359 and are thus not expected since #1712, where gix-testtools finds bash.exe associated with Git for Windows and uses it to run fixtures. But that fix didn't affect the way gix-command uses a shell, so it does not provide for the kind of shell invocations occurring here.

Still, it doesn't make sense to me why the failure happens, and why it happens only when I run the tests locally from PowerShell. On this system, in the PATH, including when accessed from PowerShell, the command sh finds is a shim for the same shell that, as of this PR, gix-path finds and gix-command uses. Furthermore, the change here makes gix-path find sh.exe more like the way gix-testtools finds bash.exe, both of which are Bash when provided by Git for Windows.

Furthermore, while most of my local testing has been with Git for Windows 2.48.1, I have verified that the test fails the same way on the same system with Git for Windows 2.47.1(2), and that the experiment described below to examine the details of the failure also gives the same results with that version. This is to say that it seems like this failure is not related, even conceptually, to #1849.

C:\Users\ek\source\repos\gitoxide [run-ci/consistent-sh ≡]> cargo nextest run -p gix-merge --no-fail-fast
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.48s
────────────
 Nextest run ID 5771b786-7fff-43d7-9721-b20970cbb558 with nextest profile: default
    Starting 20 tests across 2 binaries
     Running [ 00:00:00] 0/20: 0 running, 0 passed, 0 skipped
        PASS [   0.018s] gix-merge::merge blob::builtin_driver::text::fuzzed
        PASS [   0.022s] gix-merge::merge blob::builtin_driver::text::both_sides_same_changes_are_conflict_free
        PASS [   0.025s] gix-merge::merge blob::pipeline::without_transformation
        PASS [   0.031s] gix-merge::merge blob::pipeline::non_existing
        PASS [   0.035s] gix-merge::merge blob::pipeline::binary_below_large_file_threshold
        PASS [   0.038s] gix-merge::merge blob::builtin_driver::text::both_differ_partially_resolution_is_conflicting
        PASS [   0.040s] gix-merge::merge blob::builtin_driver::binary
        PASS [   0.047s] gix-merge::merge blob::pipeline::worktree_filter
        PASS [   0.054s] gix-merge::merge blob::pipeline::above_large_file_threshold
        PASS [   0.052s] gix-merge::merge blob::platform::merge::builtin_with_conflict
        PASS [   0.054s] gix-merge::merge blob::platform::merge::builtin_text_uses_binary_if_needed
        PASS [   0.058s] gix-merge::merge blob::platform::merge::one_buffer_too_large
        PASS [   0.070s] gix-merge::merge blob::platform::merge::same_binaries_do_not_count_as_conflicted
        PASS [   0.067s] gix-merge::merge blob::platform::prepare_merge::ancestor_and_current_and_other_do_not_exist
        PASS [   0.067s] gix-merge::merge blob::platform::prepare_merge::driver_selection
        PASS [   0.059s] gix-merge::merge blob::platform::set_resource::invalid_resource_types
        PASS [   0.200s] gix-merge::merge blob::platform::merge::missing_buffers_are_empty_buffers
        FAIL [   0.245s] gix-merge::merge blob::platform::merge::with_external
──── STDOUT:             gix-merge::merge blob::platform::merge::with_external

running 1 test
test blob::platform::merge::with_external ... FAILED

failures:

failures:
    blob::platform::merge::with_external

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.18s

──── STDERR:             gix-merge::merge blob::platform::merge::with_external
thread 'blob::platform::merge::with_external' panicked at gix-merge\tests\merge\blob\platform.rs:297:13:
b
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

        PASS [   0.330s] gix-merge::merge blob::builtin_driver::text::run_baseline
        PASS [   0.998s] gix-merge::merge tree::run_baseline
────────────
     Summary [   1.036s] 20 tests run: 19 passed, 1 failed, 0 skipped
        FAIL [   0.245s] gix-merge::merge blob::platform::merge::with_external
error: test run failed

The failing assertion is the first of the assertions in this fragment of the failing test:

let platform_ref = platform.prepare_merge(&db, Default::default())?;
let mut buf = Vec::new();
let res = platform_ref.merge(&mut buf, default_labels(), &Default::default())?;
assert_eq!(res, (Pick::Buffer, Resolution::Complete), "merge drivers always merge ");
let mut lines = cleaned_driver_lines(&buf)?;
for tmp_file in lines.by_ref().take(3) {
assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");
}
let lines: Vec<_> = lines.collect();
assert_eq!(
lines,
[
"7",
"b",
"ancestor label",
"current label",
"other label",
"%F",
"b",
"theirs"
],
"we handle word-splitting and definitely pick-up what's written into the %A buffer"
);

The first three of the cleaned driver lines are supposed to contain .tmp, but the first one is just b, which is expected to apper later. This somehow varies based on how the tests are run, and there are at least three environments for figuring out what's going on: CI, local in Git Bash, and local in PowerShell. It seems like they shouldn't differ, but they do.

Locally, it is possible to inspect things in a debugger, but this is harder on CI because while action-tmate can be used to get a reverse shell, on Windows runners the reverse shell's environment differs from the noninteractive environment in which the tests run. Fortunately, I can break the tests in a way that makes them always fail by panicking and printing all the lines:

diff --git a/gix-merge/tests/merge/blob/platform.rs b/gix-merge/tests/merge/blob/platform.rs
index 6230377b8..62882cc69 100644
--- a/gix-merge/tests/merge/blob/platform.rs
+++ b/gix-merge/tests/merge/blob/platform.rs
@@ -293,6 +293,11 @@ theirs
         let res = platform_ref.merge(&mut buf, default_labels(), &Default::default())?;
         assert_eq!(res, (Pick::Buffer, Resolution::Complete), "merge drivers always merge ");
         let mut lines = cleaned_driver_lines(&buf)?;
+        panic!(
+            "original: {:#?}\ncleaned: {:#?}",
+            buf.lines().map(|s| s.as_bstr()).collect::<Vec<_>>(),
+            lines.collect::<Vec<_>>()
+        );
         for tmp_file in lines.by_ref().take(3) {
             assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");
         }

To be clear, while all of the following are from messages printed in failing tests, the tests fail because of that change. All of them would otherwise pass, except the local run, in Windows, from PowerShell, shown last. Also, I do not know why this is happening. I am hoping you may have some insight into it.

Based off the main branch, on CI, in Ubuntu (would pass)

In the test-fast (ubuntu-latest) run, for comparison:

original: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpmK4t1a",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpNFsS5l",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmp8rvAjw",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpmK4t1a",
    "/.tmpNFsS5l",
    "/.tmp8rvAjw",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off the main branch, on CI, in Windows (would pass)

In the test-fast (windows-latest) run:

original: [
    "oursD:agitoxidegitoxidegix-merge.tmpDYZgPn",
    "D:agitoxidegitoxidegix-merge.tmpOKiCuJ",
    "D:agitoxidegitoxidegix-merge.tmpOahQgz",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursD:agitoxidegitoxidegix-merge.tmpDYZgPn",
    "D:agitoxidegitoxidegix-merge.tmpOKiCuJ",
    "D:agitoxidegitoxidegix-merge.tmpOahQgz",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Note how backslashes are being treated as escape characters and removed, which seems like it is not intended, but this does not in and of itself cause a problem with this test: it happens even on the main branch, and even without the changes from this PR.

Based off the main branch, locally, in Windows, from Git Bash (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpORUoAj",
    "C:Userseksourcereposgitoxidegix-merge.tmpuUGgiZ",
    "C:Userseksourcereposgitoxidegix-merge.tmpsYzC12",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpORUoAj",
    "C:Userseksourcereposgitoxidegix-merge.tmpuUGgiZ",
    "C:Userseksourcereposgitoxidegix-merge.tmpsYzC12",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off the main branch, locally, in Windows, from PowerShell (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpPNy4Pa",
    "C:Userseksourcereposgitoxidegix-merge.tmp5Fkmzy",
    "C:Userseksourcereposgitoxidegix-merge.tmp9aFhwO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpPNy4Pa",
    "C:Userseksourcereposgitoxidegix-merge.tmp5Fkmzy",
    "C:Userseksourcereposgitoxidegix-merge.tmp9aFhwO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off this feature branch, on CI, in Ubuntu (would pass)

In the test-fast (ubuntu-latest) run, for comparison:

original: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpvKkoxy",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpr402lE",
    "/home/runner/work/gitoxide/gitoxide/gix-merge/.tmp8bJjIe",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "ours/home/runner/work/gitoxide/gitoxide/gix-merge/.tmpvKkoxy",
    "/.tmpr402lE",
    "/.tmp8bJjIe",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

That is the same as based off the main branch, which is very much as expected, because the changes on this feature branch do not produce a different value for gix_path::env::shell() on systems other than Windows.

Based off this feature branch, on CI, on Windows (would pass)

In the test-fast (windows-latest) run:

original: [
    "oursD:agitoxidegitoxidegix-merge.tmpyuHJtT",
    "D:agitoxidegitoxidegix-merge.tmpC2s55H",
    "D:agitoxidegitoxidegix-merge.tmp06rraO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursD:agitoxidegitoxidegix-merge.tmpyuHJtT",
    "D:agitoxidegitoxidegix-merge.tmpC2s55H",
    "D:agitoxidegitoxidegix-merge.tmp06rraO",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

This is similar to the main branch CI and local Git Bash runs shown above.

Based off this feature branch, locally, in Windows, from Git Bash (would pass)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpUfxnE8",
    "C:Userseksourcereposgitoxidegix-merge.tmpiUdSfX",
    "C:Userseksourcereposgitoxidegix-merge.tmpmkVmRq",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]
cleaned: [
    "oursC:Userseksourcereposgitoxidegix-merge.tmpUfxnE8",
    "C:Userseksourcereposgitoxidegix-merge.tmpiUdSfX",
    "C:Userseksourcereposgitoxidegix-merge.tmpmkVmRq",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
    "b",
    "theirs",
]

Based off this feature branch, locally, in Windows, from PowerShell (would fail)

Via cargo nextest run -p gix-merge --no-fail-fast:

original: [
    "b",
    "theirserseksourcereposgitoxidegix-merge.tmpTDKDff",
    "C:Userseksourcereposgitoxidegix-merge.tmp5WJaSb",
    "C:Userseksourcereposgitoxidegix-merge.tmpyQDnZt",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
]
cleaned: [
    "b",
    "theirserseksourcereposgitoxidegix-merge.tmpTDKDff",
    "C:Userseksourcereposgitoxidegix-merge.tmp5WJaSb",
    "C:Userseksourcereposgitoxidegix-merge.tmpyQDnZt",
    "7",
    "b",
    "ancestor label",
    "current label",
    "other label",
    "%F",
]

Note how b comes first, and the three lines with paths to files named containing .tmp follow it. This causes this assertion, which is run for the first three lines, to fail in the first iteration (i.e. for the first line):

assert!(tmp_file.contains_str(&b".tmp"[..]), "{tmp_file}");

But I do not understand why. I'm not very familiar with the details of gix-merge. I'm also not entirely clear on which kinds of shell transformations (from parsing % items into fields, from shell expansions on unquoted parameter expansion, and from the implicit effect of joining on spaces when passing multiple arguments arising from an unquoted expansion to echo) are intended to occur in the test command, and with what effects.

Is this likely to be caused by different environments across shells run in different ways? Or is it somehow due to the name the shell is called with, even though that does not become its $0 here and, furthermore, its $0 does not seem to be expanded? What could be going on?

`gix_command::Prepare` previously used `sh` on Windows rather than
first checking for a usable `sh` implementation associated with a
Git for Windows installation. This changes it to use the `gix-path`
facility for finding what is likely the best 'sh' implementation
for POSIX shell scripts that will operate on Git repositories. This
increases consistency across how different crates find 'sh', and
brings the benefit of preferring the Git for Windows `sh.exe` on
Windows when it can be found reliably.
Because `gix-command` uses `gix_path::env::shell()` to find sh,
and `gix-credentials` uses `gix-command`.
This makes the path returned by `gix_path::env::shell()` on Windows
more usable by:

1. Adding components with `/` separators. While in principle a `\`
   should work, the path of the shell itself is used in shell
   scripts (script files and `sh -c` operands) that may not account
   for the presence of backslashes, and it is also harder to read
   paths with `\` in contexts where it appears escaped, which may
   include various messages from Rust code and shell scripts.

   The path before what we add will already use `/` and never `\`,
   unless `GIT_EXEC_PATH` has been set to a strange value, because
   it is based on `git --exec-path`, which by default gives a path
   with `/` separators. Thus, ensuring that the part we add uses `/`
   should be sufficient to get a path without `\` in all cases when
   it is clearly reasonable to do so. This therefore also usually
   increases stylistic consistency of the path, which is another
   factor that makes it more user-friendly in messages.

   This is needed to get tests to pass since changing `gix-command`
   to use `gix_path::env::shell()` on Windows, where a path is
   formatted in away that sometimes quotes `\` characters. Their
   expectations could be adjusted, but it seems likely that various
   other software, much of which may otherwise be working, has
   similar expectations. Using `/` instead of `\` works whether `\`
   is expected to be displayed quoted or not.

2. Check that the path to the shell plausibly has a shell there,
   only using it if it a file or a non-broken file symlink. When
   this is not the case, the fallback short name is used instead.

3. The fallback short name is changed from `sh` to `sh.exe`, since
   the `.exe` suffix is appended in other short names on Windows,
   such as `git.exe`, as well as being part of the filename
   component of the path we build for the shell when using the
   implementation provided as part of Git for Windows.

Those changes only affect Windows.

This also adds tests for (1) and (2) above, as well as for the
expectation that we get an absolute path, to make sure we don't
build a path that would be absolute on a Unix-like system but is
relative on Windows (a path that starts with just one `/` or `\`).

These tests are not Windows-specific, since all these expectations
should already hold on Unix-like systems, where currently we are
using the hard-coded path `/bin/sh`, which is an absolute path on
those systems. (Some Unix-like systems may technically not have
`/bin/sh` or it may not be the best path to use for a shell that
should be POSIX-compatible, but we are already relying on this,
and handling that better is outside the scope of the changes here.)
@Byron
Copy link
Member

Byron commented Feb 24, 2025

Thanks you very much for tackling this!

I didn't look at the code of this PR but want to share some thoughts on the failure.

Here is the program that is run:

for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""

The failing output has 10 lines, whereas a valid output has 11 lines. Probably that is related. My feeling is that one of the arguments in the for loop substitutions is empty. It would disappear then, and everything is offset.
If these were quoted that shouldn't happen and is probably a good idea just to make the test-script more robust.

From there it should become more obvious which parameter comes out empty, and from there it should become clear what the problem really is.

I hope that helps.

@EliahKagan
Copy link
Member Author

The failing output has 10 lines, whereas a valid output has 11 lines. Probably that is related.

Yes, in the failing run, the theirs line is absent, and one of the b lines is moved to the top.

@EliahKagan
Copy link
Member Author

EliahKagan commented Feb 24, 2025

A couple things didn't mention in #1862 (comment):

My feeling is that one of the arguments in the for loop substitutions is empty. It would disappear then, and everything is offset.

It doesn't look like that specific mechanism accounts for the failure, because in the failing case a b line appears as the first line of the output, while otherwise b does not appear until the fifth line.

As an early step (before even doing most of the testing described above), I tried adding ' ' quotes around the format specifiers and " " quotes around the parameter expansion in the merge driver command. The effect was more splitting. For example, ancestor label became two lines, ancestor and label. I think part of what using unquoted parameter expansion is currently achieving for the echo arguments is split on newlines, when then effectively get replaced with spaces because when echo is run with multiple arguments it joins them with spaces. I am very unclear on what the exact intended behavior of this merge driver test command is.

The broader issue for me examining this specific test is that I am familiar neither with the details of gix-merge and its tests nor (even though this is general Git knowledge) with the format expected for custom merge drivers.

Thanks you very much for tackling this!

No problem. My goal that this is related to is to be able to fix some other bugs where we do not run scripts correctly, such as bugs in the special casing of scripts with shebangs when use_shell is false. Fixing such bugs runs the risk of producing worse problems if there are other bugs whose effects are worse than the effect of existing bugs. (For example, running a wrong command, instead of failing to run anything, would often be worse. Likewise, running a wrong command in circumstances that arise more readily could be worse than running a wrong command in circumstances that arise less readily.) It also runs the risk that I introduce new bugs that weren't present at all before.

But that also means I risk become separated from the original bug-fixing goals, which this is peripheral to, if I shift my focus too much onto this. If possible, I'd like to avoid become too embroiled in the details of the gix-merge tests. It might not be possible to avoid this, but there are two other approaches I may attempt first to try to figure out what is going on:

  • Write something to examine how the environment is changed before and after the changes here. Manual testing with similar but non-identical ways of running shell processes suggests that the environment changes are not very great on my local machine (which is where they would have to be significant in order to be the cause of the problems I am observing on that machine but not CI). But this has the advantage that (a) it might be readily adaptable into a regression test, and (b) if I can accurately state how the environments are different, then your much greater familiarity with the failing gix-merge test and the intended gix-merge behavior might make the problem obvious even if it is not obvious to me.
  • Investigate failures on a conceptually related feature branch where I am trying to improve how gix-testtools finds bash. I'm not sure yet because this could be the result of a mistake (and unfortunately, the full test suite takes quite some time to run with GIX_TEST_IGNORE_ARCHIVES=1 on my local Windows 10 development machine), but it looks like there are comparable failures--and many of them, though most may be from a single failing fixture script--when I try to make changes there that harmonize how gix-testtools runs bash with how this PR tries to make gix-path find sh. The changes I'm working on in gix-testtools seem much smaller than those here, yet I have far worse local test failures, so that's interesting. I'll try to give more information on that soon, possibly in another draft PR about that.

Edit: Reworded for clarity and reordered the sections.

@Byron
Copy link
Member

Byron commented Feb 25, 2025

theirs is empty, even though it should be a be something that the caller knows.

I am very unclear on what the exact intended behavior of this merge driver test command is.

It attempts to make the arguments that the merge-driver was called with observable. I think there is no quoting because each of the arguments is known not to have spaces in it. However, the lack of quoting also makes empty arguments unobservable, a definitive shortcoming of the current implementation.

The broader issue for me examining this specific test is that I am familiar neither with the details of gix-merge and its tests nor (even though this is general Git knowledge) with the format expected for custom merge drivers.

Here is the docs, but the short version is that it's a script into which various parameters can be substituted into - this substitution is performed by the the caller of the merge-driver. That caller does a bare string substitution, without any care for quotes or whitespace in general.

I'd like to avoid become too embroiled in the details of the gix-merge tests.

I definitely wouldn't want you to suffer unnecessarily. The key for me to solve this certainly is to reproduce the error. For now, it doesn't make much sense to me either, unless… it's a substitution engine, what if one of the arguments that it substitutes also contains substitutable characters? Probably that's not what's happening here though, as % should be uncommon enough.

This is a patch that should tell exactly what's executed, and from there it would become more obvious what's really happening.

diff --git a/gix-merge/src/blob/platform/merge.rs b/gix-merge/src/blob/platform/merge.rs
index 058a30128..324742a00 100644
--- a/gix-merge/src/blob/platform/merge.rs
+++ b/gix-merge/src/blob/platform/merge.rs
@@ -211,7 +211,7 @@ pub(super) mod inner {
                     cmd.extend_from_slice(token);
                 }
 
-                Ok(merge::Command {
+                Ok(dbg!(merge::Command {
                     cmd: gix_command::prepare(gix_path::from_bstring(cmd))
                         .with_context(context)
                         .command_may_be_shell_script()
@@ -223,7 +223,7 @@ pub(super) mod inner {
                     current_path: ours_path,
                     ancestor: base_tmp,
                     other: theirs_tmp,
-                })
+                }))
             }
 
             /// Return the configured driver program for use with [`Self::prepare_external_driver()`], or `Err`

I hope that helps.

@EliahKagan
Copy link
Member Author

EliahKagan commented Feb 25, 2025

Thanks--you're right that the actual arguments that are passed are important information. I had glossed over this because it did not look like they differed between the passing and failing environment, but that doesn't mean they aren't needed to know what's going on.

The results, with the patch you gave that reveals the Command instance with dbg!, when running the test on this feature branch from PowerShell where it fails, are:

C:\Users\ek\source\repos\gitoxide [run-ci/consistent-sh ≡ +0 ~1 -0 ~]> cargo nextest run -p gix-merge with_external
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.46s
────────────
 Nextest run ID 42b38560-812f-4bcd-9548-97bb529b272d with nextest profile: default
    Starting 1 test across 2 binaries (19 tests skipped)
     Running [ 00:00:00] 0/1: 0 running, 0 passed, 0 skipped
        FAIL [   0.158s] gix-merge::merge blob::platform::merge::with_external
──── STDOUT:             gix-merge::merge blob::platform::merge::with_external

running 1 test
test blob::platform::merge::with_external ... FAILED

failures:

failures:
    blob::platform::merge::with_external

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.15s

──── STDERR:             gix-merge::merge blob::platform::merge::with_external
[gix-merge\src\blob\platform\merge.rs:214:20] merge::Command
{
    cmd:
    gix_command::prepare(gix_path::from_bstring(cmd)).with_context(context).command_may_be_shell_script().stdin(Stdio::null()).stdout(Stdio::inherit()).stderr(Stdio::inherit()).into(),
    current: ours_tmp, current_path: ours_path, ancestor: base_tmp, other:
    theirs_tmp,
} = "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe" "-c" "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp8ULmDC C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpLByBCI 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp8ULmDC\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpLByBCI\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpmHJGGo\"" "--"

thread 'blob::platform::merge::with_external' panicked at gix-merge\tests\merge\blob\platform.rs:297:13:
b
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

  Cancelling due to test failure
────────────
     Summary [   0.167s] 1 test run: 0 passed, 1 failed, 19 skipped
        FAIL [   0.158s] gix-merge::merge blob::platform::merge::with_external
error: test run failed

That's a bit hard to read due to the way the Command sub-object formats, and it does not automatically show output in the case that the test passes. So I have also tested with this change instead:

diff --git a/gix-merge/src/blob/platform/merge.rs b/gix-merge/src/blob/platform/merge.rs
index 058a30128..39e27e51f 100644
--- a/gix-merge/src/blob/platform/merge.rs
+++ b/gix-merge/src/blob/platform/merge.rs
@@ -399,6 +399,7 @@ impl<'parent> PlatformRef<'parent> {
         match self.configured_driver() {
             Ok(driver) => {
                 let mut cmd = self.prepare_external_driver(driver.command.clone(), labels, context.clone())?;
+                panic!("program: {:?}\nargs: {:#?}", cmd.get_program(), cmd.get_args());
                 let status = cmd.status().map_err(|err| Error::SpawnExternalDriver {
                     cmd: format!("{:?}", cmd.cmd),
                     source: err,

Then, when run from PowerShell, the output included:

program: "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpayn7BE C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpCbHa1a 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpayn7BE\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpCbHa1a\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpEvLyJo\"",
        ),
        Regular(
            "--",
        ),
    ],
}

When run from Git Bash, where the test passes when allowed to proceed, the output included this instead:

program: "C:/Users/ek/scoop/apps/git/2.48.1/usr/bin/sh.exe"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpfax7ph C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3 C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp5fMWd7 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpfax7ph\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmp5fMWd7\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpcPJLm3\"",
        ),
        Regular(
            "--",
        ),
    ],
}

This is to say that only the randomly generated suffixes after .tmp in temporary filenames are different (and they would differ across separate identical runs, too).

On the main branch, in PowerShell, where the test also passes if allowed to proceed, the only differences besides those temporary files' names is the shell command:

program: "sh"
args: CommandArgs {
    inner: [
        Regular(
            "-c",
        ),
        Regular(
            "for arg in  C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpD7kDyR C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83 C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpkYgWkr 7 \'b\' \'ancestor label\' \'current label\' \'other label\' %F; do echo $arg >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83\"; done; cat \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpD7kDyR\" \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmpkYgWkr\" >> \"C:\\Users\\ek\\source\\repos\\gitoxide\\gix-merge\\.tmppnvU83\"",
        ),
        Regular(
            "--",
        ),
    ],
}

State from previous runs in this experiment is not likely to be a contributing factor; I ran gix clean -xde in between.

On main, sh is used rather than a full. That is looked up to C:\Users\ek\scoop\shims\sh.exe on this system, which is a scoop shim that delegates to C:\Users\ek\scoop\apps\git\current\bin\sh.exe, which is the Git for Windows bin\sh.exe shim that delegates to the Git for Windows usr\bin\sh.exe. That latter distinction--that is, whether the Git for Windows sh.exe is involved--seems to be an essential element here, because the failure goes away under this change:

diff --git a/gix-path/src/env/mod.rs b/gix-path/src/env/mod.rs
index 78e2da294..cd6203df7 100644
--- a/gix-path/src/env/mod.rs
+++ b/gix-path/src/env/mod.rs
@@ -51,7 +51,7 @@ pub fn shell() -> &'static OsStr {
                     // more readable messages, append literally with `/` separators. The path from
                     // `git --exec-path` will already have all `/` separators (and no trailing `/`)
                     // unless it was explicitly overridden to an unusual value via `GIT_EXEC_PATH`.
-                    raw_path.push("/usr/bin/sh.exe");
+                    raw_path.push("/bin/sh.exe");
                     raw_path
                 })
                 .filter(|raw_path| {

But I am reluctant to apply that change without understanding what it is about the effect of the Git for Windows shim that is helping, to know that this shim really is preferable rather than the test happening to hinge on a quirk that is different. One reason I am reluctant is that, under that change, if git is provided by the Git for Windows SDK, its sh.exe would not be found. If the non-shim Git for Windows sh.exe is usable but just moderately less preferred, then we could check for the shim first and then fall back to the non-shim. But I would want to understand the situation better before jumping into that, in case really neither is preferred, or in case the non-shim executable does not work well enough for our needs to be worth using directly.

My guess is that the relevant effect of the shim is in changing the environment, but I am not certain it refrains entirely from adjusting arguments. (I think the shim does not adjust arguments directly. But even if that is right, msys-2.0.dll affects interpretation of arguments--that is, what ends up in the argv that a program that uses that DLL as its libc uses--and does so in a way that is affected by some environment variables.)

I'll try to modify the command to report its arguments and environment in a way that they cannot be confused with anything else, such as by writing them to a file. However, since this will necessarily modifying the command or its environment, it is not certain to be successful at revealing the cause.


Edit: A possible contributing factor is that I also have some bin directories associated with a regular MSYS2 installation (separate from Git for Windows) in my PATH on Windows.

When the shim is used, two directories belonging to the MSYS2 installation that is associated with the running shell, /mingw64/bin and /usr/bin, as well as my per-user bin at /c/Users/ek/bin (which I think is less important in this case), are prepended, to take precedence.

The reason I say /mingw64/bin and /usr/bin belong to the MSYS2 installation associated with the running shell is that they are Unix-style paths that are automatically treated that way and translated based on that when calling native external commands.

C:\Users\ek\scoop\apps\git\2.48.1\usr\bin\sh.exe -c 'echo "$PATH" | tr ":" "\n" | sort' > ubp.txt
C:\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe -c 'echo "$PATH" | tr ":" "\n" | sort' > bp.txt
git --no-pager diff --no-index ubp.txt bp.txt

Where the output of that last command is:

diff --git a/ubp.txt b/bp.txt
index 7390dca90..7190bd5b3 100644
--- a/ubp.txt
+++ b/bp.txt
@@ -1,3 +1,6 @@
+/mingw64/bin
+/usr/bin
+/c/Users/ek/bin
 /c/Users/ek/scoop/apps/pwsh/7.5.0
 /c/Program Files (x86)/VMware/VMware Workstation/bin
 /c/Windows/system32

Note that this experiment is not equivalent to checking the $0, $@, and the environment variables in an actual merge driver command, and I am not certain if this is the cause. Even if it is, I am not clear on how it is the cause.

@Byron
Copy link
Member

Byron commented Feb 26, 2025

This is very strange and puzzling.
I understand that the way the shell is called is the same no matter what, so the reason for the difference is in how the shell scripted merge-driver is interpreted.

And that's particularly strange as there isn't even any argument to interpret, the values are quite literally baked in. All I can think of is poking around in the script, maybe by starting to execute it directly. It's pretty clear what is invoked in the working and non-working case, so that could be generalized to invocations that are repeatable on the command-line directly. From there it can be reduced and probed to see what exactly is causing the observable difference (assuming it reproduces in the first place). If it doesn't, we'd know that some environment variable is affecting it as well.

And I say this without even implying that you spend more time on this, I am just sharing thoughts.

@EliahKagan
Copy link
Member Author

EliahKagan commented Feb 26, 2025

And I say this without even implying that you spend more time on this, I am just sharing thoughts.

Actually, that is no problem. What I wanted to avoid was becoming too distracted trying to understand the semantics of external merge drivers, how gix-merge uses them, and the claims about them that the test cases are asserting.

But it looks like the problem is due to unexpected behavior of sh/bash on Windows when called through a path obtained by gix_path::env::shell(), which in this case is triggered by the particular command run through gix-merge but which is unexpected even if one is aware of the details of how that is supposed to work. But that is what I was worried about being distracted from. So I am willing to spend more time on this--investigating just this kind of weirdness one of the things I wanted to save time to be able to do.

Of course, I still hope I manage to come to a sufficient understanding sooner rather than later, since it is itself a fragment of the larger issue of how to make gix-command use shells more robustly.

I understand that the way the shell is called is the same no matter what, so the reason for the difference is in how the shell scripted merge-driver is interpreted.

Yes. I believe even more strongly, now, that the effect of the shim is the key. The effect is probably through its customization of environment variables, though I am not certain.

I continue to suspect that the presence of a separate MSYS2 installation with a bin directory in my PATH is a contributing factor. But when gitoxide uses a shell, it should work properly even in such a case, especially since most things work on this system, including running the gitoxide test suite from PowerShell since #1712.

With the shim sh.exe provided by Git for Windows (which the scoop shim delegates to, though here I am running it directly):

C:\Users\ek> C:\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe -c 'type -a sh cygpath'
sh is /usr/bin/sh
sh is /c/Users/ek/scoop/shims/sh
sh is /c/msys64/usr/bin/sh
cygpath is /usr/bin/cygpath
cygpath is /c/msys64/usr/bin/cygpath

With the non-shim sh.exe provided by Git for Windows:

C:\Users\ek> C:\Users\ek\scoop\apps\git\2.48.1\usr\bin\sh.exe -c 'type -a sh cygpath'
sh is /c/Users/ek/scoop/shims/sh
sh is /c/msys64/usr/bin/sh
cygpath is /c/msys64/usr/bin/cygpath

So the PATH changes effected by the shim make the environment a lot more reasonable. Without them, even the cygpath executable--that I might otherwise use as part of checking some of the intended behavior--is from a different MSYS2 installation than the shell from which it would be run. There is an argument to be made that my environment is broken, such that the failures I am having locally are not actually a problem and could be ignored. I think that argument is weak, though.

If we are going to prefer an sh.exe provided by Git for Windows even if another one appears earlier in the PATH--as gix_path::env::shell() does--then it should work even if its associated bin directories are not already in PATH, or even if something else with some same-named binaries appears first. This situation on my system is like that, just with the scoop shim (which delegates to the Git for Windows shim without modifying the environment, i.e., it runs sh.exe with the environment modifications of the Git for Windows shim and no others) appearing even earlier in the path.

More broadly, as I understand it, the Git for Windows shim exists for two reasons. It delegates to the "real" executable. But it also modifies the environment. Shims don't have to modify the environment; for example, the scoop shim does not. Presumably these environment modifications are sometimes valuable. So unless we can know we don't need them, which is hard to do in the general case of gix_command::Prepare, we should probably prefer to call the shim, but fall back to the non-shim if it is absent (which probably does about as well as can be done in the Git for Windows SDK, which doesn't provide the shims).

This might be separate from the issue of whether to prefer one of the shims of git.exe over the "real" git.exe, because running that non-shim directly is deliberately supported by Git for Windows since git-for-windows/git#2506. Currently we usually find a shim for git.exe, since that's what's usually in the path, but if we don't, we use the non-shim found in an expected installation location. (This was the case at since #1419, and both before and after #1456, though more consistently afterwards. See also the "Simpler paths could be used" section of #1456, as well as this code comment and the part of #1758 (comment) about clangarm64 and shims.)

Switching to using the shell through its shim makes sense both for sh.exe here and for bash.exe in #1864 (where it would be switching back, i.e. undoing the part of that PR that stopped using the shim). So I am tempted to just make those changes and--after retesting, just in case--say that it is fixed. Although I think that will be the solution, I don't want to jump to it just yet, because:

  • Until I understand, or at least have some better idea of, what the effect of the shim is in the failures I've observed, I will not be certain that a change to preferring it is robust. It could be that some equally reasonable (or unreasonable) environment would be broken by preferring the shim, rather than fixed. I doubt that, but I don't know.
  • It may be that there are contributing factors that are important problems for gix-command in their own right, and that I am not otherwise aware of. I'd rather not pass up the opportunity to find out.
  • Knowing about what depends on the effect of the shim may shed light on whether there is anything further that needs to be documented as a caveat in gix-command, and what that would be.
  • Knowing the effect of the shim and its implications may make it possible to implement an optimization later that does the work of the shim and thus avoids the extra subprocess creation, which on Windows is somewhat expensive. I certainly do not consider that as a blocker for this, but I don't want to pass up an opportunity to gather information that would be essential to it.

Further investigation, so far

As for why I believe the effect of the shim is the cause, I tried this modification:

diff --git a/gix-merge/tests/merge/blob/platform.rs b/gix-merge/tests/merge/blob/platform.rs
index 6230377b8..2dc09ebe5 100644
--- a/gix-merge/tests/merge/blob/platform.rs
+++ b/gix-merge/tests/merge/blob/platform.rs
@@ -265,7 +265,7 @@ theirs
             [gix_merge::blob::Driver {
                 name: "b".into(),
                 command:
-                    "for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""
+                    "(printf '[%q]\\n' \"$0\" \"$@\"; set -o posix; set) >args+env; for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""
                         .into(),
                 ..Default::default()
             }],

That makes it create a file called args+env showing:

  • $0 for the shell, which is always -- (due to #1842) but this confirms it.
  • The positional parameters for the shell ($@). There are none, but this confirms it.
  • All environment variables. I produced the environment variables with set -o posix; set rather than a more typical approach of running printenv or env with no arguments, in order to avoid confounding effects arising from the possibility of calling a separate external command that might be looked up in a directory belonging to a different MSYS2 installation. (This is in a ( ) subshell, so set -o posix does not have a broader effect.)

I ran cargo nextest run -p gix-merge with_external on this feature branch, on the Windows 10 system where I observed the failures, with that change, twice:

  • With no further change.
  • With shell() modified to use the shim (already known to make the failure go away).

In the second run, the change to shell() so it uses the shim was:

diff --git a/gix-path/src/env/mod.rs b/gix-path/src/env/mod.rs
index 78e2da294..cd6203df7 100644
--- a/gix-path/src/env/mod.rs
+++ b/gix-path/src/env/mod.rs
@@ -51,7 +51,7 @@ pub fn shell() -> &'static OsStr {
                     // more readable messages, append literally with `/` separators. The path from
                     // `git --exec-path` will already have all `/` separators (and no trailing `/`)
                     // unless it was explicitly overridden to an unusual value via `GIT_EXEC_PATH`.
-                    raw_path.push("/usr/bin/sh.exe");
+                    raw_path.push("/bin/sh.exe");
                     raw_path
                 })
                 .filter(|raw_path| {

After each run, I moved the just-created args+env file from the gix-merge to the top-level directory of the working tree and renamed it descriptively, to no-shim.txt in the first run and with-shim.txt in the second run. The changed lines between them, obtained by running git --no-pager diff -U0 --no-index no-shim.txt with-shim.txt, are:

diff --git a/no-shim.txt b/with-shim.txt
index 3f51928c0..11cbfa69b 100644
--- a/no-shim.txt
+++ b/with-shim.txt
@@ -4 +4 @@ APPDATA='C:\Users\ek\AppData\Roaming'
-BASH=/usr/bin/sh
+BASH=/usr/bin/bash
@@ -10 +10 @@ BASH_CMDS=()
-BASH_EXECUTION_STRING='(printf '\''[%q]\n'\'' "$0" "$@"; set -o posix; set) >args+env; for arg in  C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpLx3byu C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp7p4vnw C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpnxT3lk 7 '\''b'\'' '\''ancestor label'\'' '\''current label'\'' '\''other label'\'' %F; do echo $arg >> "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp7p4vnw"; done; cat "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpLx3byu" "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpnxT3lk" >> "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp7p4vnw"'
+BASH_EXECUTION_STRING='(printf '\''[%q]\n'\'' "$0" "$@"; set -o posix; set) >args+env; for arg in  C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp5rjEOo C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpaOs19F C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp62jrFQ 7 '\''b'\'' '\''ancestor label'\'' '\''current label'\'' '\''other label'\'' %F; do echo $arg >> "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpaOs19F"; done; cat "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp5rjEOo" "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmp62jrFQ" >> "C:\Users\ek\source\repos\gitoxide\gix-merge\.tmpaOs19F"'
@@ -39,0 +40 @@ EUID=197609
+EXEPATH='C:\Users\ek\scoop\apps\git\2.48.1\bin'
@@ -56,0 +58 @@ MACHTYPE=x86_64-pc-msys
+MSYSTEM=MINGW64
@@ -61 +63 @@ NEXTEST_PROFILE=default
-NEXTEST_RUN_ID=203cd7ac-794f-4c11-af4a-c96333d95745
+NEXTEST_RUN_ID=ab48dadf-b347-40f2-8770-a73e91c8eb7d
@@ -72 +74 @@ PAGER='less -F'
-PATH='/c/Users/ek/source/repos/gitoxide/target/debug/deps:/c/Users/ek/source/repos/gitoxide/target/debug:/c/Users/ek/.rustup/toolchains/stable-x86_64-pc-windows-msvc/lib/rustlib/x86_64-pc-windows-msvc/lib:/c/Users/ek/scoop/apps/pwsh/7.5.0:/c/Program Files (x86)/VMware/VMware Workstation/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Windows/System32/OpenSSH:/c/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit:/c/Program Files/dotnet:/c/Users/ek/scoop/apps/vscodium/current/bin:/c/Users/ek/scoop/apps/nodejs-lts/current/bin:/c/Users/ek/scoop/apps/nodejs-lts/current:/c/Users/ek/scoop/apps/openjdk23/current/bin:/c/Users/ek/scoop/apps/ghostscript/current/lib:/c/Users/ek/scoop/apps/gpg/current/bin:/c/Users/ek/scoop/apps/mpv/current:/c/Users/ek/scoop/apps/maven/current/bin:/c/Users/ek/.cargo/bin:/c/Users/ek/bin:/c/Users/ek/.local/share/gem/ruby/3.0.0/bin:/c/Users/ek/scoop/shims:/c/Users/ek/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0/LocalCache/local-packages/Python312/Scripts:/c/Users/ek/AppData/Local/Microsoft/WindowsApps:/c/Users/ek/.dotnet/tools:/c/Users/ek/AppData/Local/Programs/Microsoft VS Code/bin:/c/msys64/ucrt64/bin:/c/msys64/ucrt64/sbin:/c/msys64/usr/bin:/c/Users/ek/AppData/Local/Programs/MiKTeX/miktex/bin/x64:/c/users/ek/.local/bin:/c/Users/ek/AppData/Local/JetBrains/Toolbox/scripts:/c/Program Files/IronPython 3.4:/c/Users/ek/AppData/Local/Programs/Microsoft VS Code Insiders/bin:/c/Users/ek/.dotnet/tools'
+PATH='/mingw64/bin:/usr/bin:/c/Users/ek/bin:/c/Users/ek/source/repos/gitoxide/target/debug/deps:/c/Users/ek/source/repos/gitoxide/target/debug:/c/Users/ek/.rustup/toolchains/stable-x86_64-pc-windows-msvc/lib/rustlib/x86_64-pc-windows-msvc/lib:/c/Users/ek/scoop/apps/pwsh/7.5.0:/c/Program Files (x86)/VMware/VMware Workstation/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Windows/System32/OpenSSH:/c/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit:/c/Program Files/dotnet:/c/Users/ek/scoop/apps/vscodium/current/bin:/c/Users/ek/scoop/apps/nodejs-lts/current/bin:/c/Users/ek/scoop/apps/nodejs-lts/current:/c/Users/ek/scoop/apps/openjdk23/current/bin:/c/Users/ek/scoop/apps/ghostscript/current/lib:/c/Users/ek/scoop/apps/gpg/current/bin:/c/Users/ek/scoop/apps/mpv/current:/c/Users/ek/scoop/apps/maven/current/bin:/c/Users/ek/.cargo/bin:/c/Users/ek/bin:/c/Users/ek/.local/share/gem/ruby/3.0.0/bin:/c/Users/ek/scoop/shims:/c/Users/ek/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0/LocalCache/local-packages/Python312/Scripts:/c/Users/ek/AppData/Local/Microsoft/WindowsApps:/c/Users/ek/.dotnet/tools:/c/Users/ek/AppData/Local/Programs/Microsoft VS Code/bin:/c/msys64/ucrt64/bin:/c/msys64/ucrt64/sbin:/c/msys64/usr/bin:/c/Users/ek/AppData/Local/Programs/MiKTeX/miktex/bin/x64:/c/users/ek/.local/bin:/c/Users/ek/AppData/Local/JetBrains/Toolbox/scripts:/c/Program Files/IronPython 3.4:/c/Users/ek/AppData/Local/Programs/Microsoft VS Code Insiders/bin:/c/Users/ek/.dotnet/tools'
@@ -74,0 +77 @@ PIPESTATUS=([0]="0")
+PLINK_PROTOCOL=ssh

(The PATH line is one of the changed lines, as indicated by the leading - and +, even though GitHub is failing to highlight it as such for some reason.)

The changed lines that I think are very unlikely to contribute in any way to the difference in behavior are:

  • BASH_EXECUTION_STRING, which differs only in the names of the temporary files, which conform to the same template and are reasonable in both cases.
  • NEXTEST_RUN_ID, which differs in the expected way across separate cargo nextest runs. Both have the correct format, and the tested code, and test cases, do not plausibly use this variable.
  • PLINK_PROTOCOL, since no transport operations are involved here. (Also, while I do have PuTTY and plink installed, I am not using them with Git or gitoxide, which find an OpenSSH implementation.)

In contrast, the changed lines for environment variables that I think could plausibly make a difference are:

  • BASH, which changed from /usr/bin/sh to /usr/bin/bash when the shim was used. This is usually the value of argv[0] is it is seen by the shell. (bash creates a BASH variable whether it is running in POSIX mode or not.)

    I don't know whether the path it shows changing from /usr/bin/sh to /usr/bin/bash this means that the sh.exe shim is actually running the non-shim bash.exe instead of non-shim sh.exe. The simplest explanation is that it is actually running bash.exe, which would be surprising (there is a separate shim for bash.exe, after all).

    But whatever the filename of the executable that is loaded, the way bash is invoked affects whether it enters POSIX mode automatically. When invoked by the name sh, it automatically enters POSIX mode (after running commands from any startup files that it would have run commands from). This suggests that the shim may cause the shell not to run in POSIX mode. And that seems to be confirmed by the following:

    C:\Users\ek> C:\Users\ek\scoop\apps\git\2.48.1\usr\bin\sh.exe -c 'echo "$SHELLOPTS"'
    braceexpand:hashall:interactive-comments:posix
    C:\Users\ek> C:\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe -c 'echo "$SHELLOPTS"'
    braceexpand:hashall:interactive-comments
    

    That is certainly interesting in its own right, and potentially important to adjacent work on gitoxide. It even suggests that defaulting to sh.exe on Windows, when it is thought to be a shim provided by Git for Windows, might be a reasonable way to run bash scripts in Windows while getting around the problem--described in http://gitpython-developers/GitPython#1791 and GitoxideLabs/gitoxide#1359 (comment)--that the bash.exe that is usually found is the one associated with WSL! (If it is reliably the case across different installations and versions, and not considered a bug in Git for Windows.)

    But I am not sure why, or if, that would make a difference for the command run in this gix-merge test.

  • EXEPATH, which was newly defined when the shim was used. I am not sure what effect this has. gix-path itself will try to use it to avoid having to run git config -l. But this is in a subprocess of the executable that uses gix-path and other gitoxide crates, so it does not have an effect on this.

    (This path is also different from the one that it usually has in a Git Bash shell and that the EXEPATH optimization in gix-path would need; typically the path would be one level up, i.e., it would not have the bin component. It would also typically be a Windows path. So it would usually be C:\Users\ek\scoop\apps\git\2.48.1. However, I believe that is a separate issue. Somehow the git-bash.exe program that runs Git Bash, typically in a mintty window, causes EXEPATH to be customized differently. The somewhat unusual value of EXEPATH is also set when I use Git Bash in Windows Terminal, where I made the profile because scoop does not automatically configure that. To the best of my knowledge, there is no hard guarantee about whether EXEPATH exists or what its value is. But even if the value EXEPATH takes on my system is considered wrong, it is not likely the cause of the test failure here, which happens in the case that EXEPATH is not set because it is run outside of a Git Bash environment and without the shim.)

  • MSYSTEM, which was newly defined when the shim was used. But I am not sure what effect that has, if any, once the shell is already running. But perhaps it already has had an effect on the shell itself due to being set beforehand in the shim?

  • PATH, which with the shim has /mingw64/bin:/usr/bin:/c/Users/ek/bin: prepended.

    To be clear, this PATH is automatically converted into a ;-separated PATH whose entries like /c/... are turned into entries like C:\ when a native Windows program is called. (I think that might technically always happen and it just is always converted back in the called program when that program uses msys-2.0.dll as its libc, but I am not sure.) So that the path is shown in this form in the shell is not itself a problem.

    However, how this conversion--as well as other conversions such as in command-line parsing--are done by msys-2.0.dll is affected by the environment, reflects the environment for which it is done, and I believe also differs in some ways across Cygwin-like environments of different kinds (beyond just the DLL that implements it being named differently).

    Those are other aspects of running MSYS2 programs, and of running things from MSYS2 programs that may or may not be MSYS2 programs, whose implications for gix-command I have been looking into already and concurrently. It applies to running shells like sh and bash, which are often MSYS2 programs, including if provided by Git for Windows (except they may have been MSYS1 programs in very old versions). It applies to the case where use_shell is true, but also the case where gix-command's own shebang interpretation would or should cause a shell like sh or bash to be used.

    I had hoped that this issue was separate from that. Manual experiments--both before opening this issue and now--made me think that was not a contributing factor here. The subsequently described experiment (see below) rules out some variations of this. But I think it is still a plausible cause.

The differences in BASH_EXECUTION_STRING and PATH were verified using this diff tool.

I did another experiment where I applies this change instead, tracing the shell commands and recording standard error including from external processes such as cat:

diff --git a/gix-merge/tests/merge/blob/platform.rs b/gix-merge/tests/merge/blob/platform.rs
index 6230377b8..050e97873 100644
--- a/gix-merge/tests/merge/blob/platform.rs
+++ b/gix-merge/tests/merge/blob/platform.rs
@@ -265,7 +265,7 @@ theirs
             [gix_merge::blob::Driver {
                 name: "b".into(),
                 command:
-                    "for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""
+                    "exec 2>messages; set -x; for arg in  %O %A %B %L %P %S %X %Y %F; do echo $arg >> \"%A\"; done; cat \"%O\" \"%B\" >> \"%A\""
                         .into(),
                 ..Default::default()
             }],

Analogously to the above-described experiment, I ran the test with that change twice, once without the shim and once with the shim, moving and renaming the log file after each run.

The diffs showed only the change in temporary-file names. In neither run was anything in a different order. Nor were there any error messages, such as I would expect if the shell could not open a file for redirection or if cat could not open a file for read.

All I can think of is poking around in the script, maybe by starting to execute it directly.

Running modified versions of the script that write to different locations I can easily inspect within a test directory does not reveal anything. But I am not sure this captures what is relevant. I think the next step along these lines is to preserve the temporary files that the test uses and examine them.

@Byron
Copy link
Member

Byron commented Feb 27, 2025

That's good to know.

Of course, I still hope I manage to come to a sufficient understanding sooner rather than later, since it is itself a fragment of the larger issue of how to make gix-command use shells more robustly.

This is incredibly valued work, and work that I can't even do as it, I wouldn't last long. I guess that's another way to thank you for your incredible work!

Although I think that will be the solution, I don't want to jump to it just yet, because:

My initial intuition was to jump to it and be done, but the reasoning to not do that just yet is very sound. I found the "do the work of the shim and avoid calling it that way" very interesting. Understanding this better could at least lead to documentation that helps to one day implement it, even if it's not done in the initial implementation.

I mostly skimmed over the details that followed, but saw the details of the merge-driver runs. It's strange that it has the wrong output despite the calls being the same, and that it seems so elusive to find out why it does that. The caller is known, the input is known, the binary is known, the script itself is known, so what could possibly be causing the different result. I wonder if it can be executed directly while reproducing the issue, allowing it to be prodded and probed with impunity until it reveals its secret.

The diff-tool is very interesting by the way, I starred the repo and would hope to benefit from it one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants