Delayed process filters suppress +x in `gix clone` #1783

EliahKagan · 2025-01-20T03:07:12Z

Current behavior 😯

gix clone checks out a file that should have executable permissions but fails to set those permissions, when all of the following are true:

The repository has a .gitattributes file that applies an attribute to the file that should be executable.
Configuration exists in a scope used during gix clone, such as the command scope with -c, that enables a smudge filter for that attribute.
The configured smudge filter is a long running filter (that is, a process filter).
The filter process supports delays. Unlike (2) and (3), this is a runtime behavior of the filter process itself; the process indicates its support for delays by sending a can-delay=1 "packet" to the calling process.

Each of those elements is required, to trigger the bug. If the attribute is not applied to a particular file (even if it is applied to other files and the smudge filter is configured), or if the smudge filter is not configured (regardless of attributes), or if the smudge filter is not a process filter, or if the smudge filter is a process filter but it does not support delays, then the file that should be executable is still checked out as executable.

Relationship to CVE-2025-22620

This is conceptually related to GHSA-fqmf-w4xh-33rh (RUSTSEC-2025-0001), and I found it while researching that. But this is a separate bug. I do not believe this bug is a security vulnerability. Quite the opposite: this bug is the main reason gix clone itself appears not to be affected by GHSA-fqmf-w4xh-33rh even when gitoxide is built against a vulnerable version of gix-worktree-state.

I've confirmed that this bug occurs both before and after the fix for GHSA-fqmf-w4xh-33rh in #1764, by testing with both gitoxide 0.40.0 (using a build from before gix-worktree-state 0.17.0 was released) and gitoxide 0.41.0. I would also not expect the changes in #1764 to have fixed or otherwise affected this, because that causes finalize_entry to increase permissions less than they were increased before (i.e., chmod is called equally or, in the TOCTOU case, slightly less often, and the bits that are flipped from 0 to 1 are a subset of those that were before).

What seems to be happening

When a long running smudge filter is used for a file during checkout, and it allows delays, gix-worktree-state defers setting +x permissions, planning to set them after the file has been rewritten by the filter, rather than before the filter operates. But finalize_entry does not always set +x.

Specifically, it looks like, if a file is checked out with destination_is_initially_empty: true in the checkout::Options but a process smudge filter with delays applies to that file, setting +x is deferred because a filter is to be used, but then not applied, due to destination_is_initially_empty. However, I am not certain that it always prevents that from happening, in part because all of my testing on this gix-worktree-state bug has been with gix clone (other than with git clone to compare).

Concrete example

If a repository tracks files a, b, and c, where a and b have an attribute that subjects them to a long-running smudge filter with delays enabled, and where b and c are tracked as executable, then the only file gix clone checks out as executable is c. After checkout:

$ git ls-files -s -- a b c
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 0       a
100755 223b7836fb19fdf64ba2d3cd6173c6a283141f78 0       b
100755 3cc58df83752123644fef39faab2393af643b1d2 0       c
$ ls -lF a b c
-rw-r--r-- 1 ek ek 5 Jan 19 17:02 a
-rw-r--r-- 1 ek ek 5 Jan 19 17:02 b
-rwxr-xr-x 1 ek ek 2 Jan 19 17:02 c*

See the "Expected behavior" section for a comparison, and the "Steps to reproduce" section for the detailed testing methodology.

Expected behavior 🤔

All files, even those to which a long running smudge filter with delays is applied, should be checked out with executable permissions corresponding to what the repository tracks (provided that the operating system and filesystem support Unix-style executable permissions, and no other configuration option prevents them).

Of the concrete example

In the above example, both b and c should end up executable so the file modes on disk are consistent with respect the presence or absence of +x with their modes in the tree object, and we should see:

$ ls -lF a b c
-rw-r--r-- 1 ek ek 5 Jan 19 17:02 a
-rwxr-xr-x 1 ek ek 2 Jan 19 17:02 b*
-rwxr-xr-x 1 ek ek 2 Jan 19 17:02 c*

Design considerations

I think it is not immediately clear what the best way is to bring about this change.

If destination_is_initially_empty: true always produces this effect in a checkout with delayed process filters (including in comparable checkouts performed when calling gix_worktree_state::checkout from an application other than the gitoxide gix executable), then one possible view of this bug is that it is a combination of:

A documentation bug, where this effect of destination_is_initially_empty is not made clear.
A bug in gitoxide-core, where checkouts use destination_is_initially_empty even when long-running smudge filters with delays enabled apply to at least one file.

However, I find that view unsatisfying, for five reasons, where each except (1) points the way to another possible approach:

The full performance and semantics (including collision detection) of an exclusive checkout, which is brought about by destination_is_initially_empty: true, are valued for gix clone.
An option called destination_is_initially_empty should, if possible, do the right thing when the destination is initially empty. The top-level checkout function is the only public one, and it takes a bool for destination_is_initially_empty in its options, which it passes down to a helper that checks out an individual file.

But if destination_is_initially_empty needs to be set differently for different files depending on what filters are set up, then that option should accept more than two values, or no longer be passed directly to the helper, or both. This would go well beyond updating documentation in gix-worktree-state and modifying gitoxide-core. I don't favor this, because it seems like a design change that is broader than (3) below and likely more complicated, while not addressing the considerations of (4) or (5).
The current intended behavior of long running smudge filters with delays enabled--as I understand it--is to cause finalize_entry to run after the filter is known to be done with the file, even when destination_is_initially_empty: true. That is, it looks like long running smudge filters with delays are the one intended exception to destination_is_initially_empty: true eagerly calling finalize_entry.

If so, making that happen could be a fairly narrowly scoped fix for this bug. This would require no change to gitoxide-core and no major change to the documentation, though minor refinement of the documentation could accompany it. Further improvements related to points (4) and (5) below could come later. This would have exacerbated GHSA-fqmf-w4xh-33rh, but now that it is fixed, I am not aware of any strong reason to avoid this approach.
Do long running smudge filters with delays actually require waiting to set executable permissions? There are three reasons I can think of to wait, but none seem very strong:
1. To avoid marking files executable before they are smudged. For example, to avoid marking stubs for Git LFS executable, when only the file they stand for would actually be reasonable to execute. The problem with this is that creating stubs without executable permissions and adding them later would only address that in the case that the filter is actually enabled. If it is not enabled in configuration variables, then the tracked +x is applied eagerly. So this would only be keeping +x away during checkouts, and not in the also-common case that the filter is not applied at all.
2. In case the effect of the smudge filter is ever to replace the file with a new one rather than writing to the existing file. But if I understand the recommendations in gitattributes(5) for filters, smudge and clean filters are not supposed to do that kind of thing, and it is not guaranteed to work if they do. The filter reads from stdin and writes to stdout:
  
  Upon checkout, when the smudge command is specified, the command is fed the blob object from its standard input, and its standard output is used to update the worktree file. Similarly...
  
  The filter can decide what to do based on the path but should not read from it (and this seems like it applies to writing as well):
  
  Note that "%f" is the name of the path that is being worked on. Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.
  
  The filter should provide the updated contents on stdout, and this is the case even for process filters, which use the packetline protocol to do so:
  
  The filter is expected to respond with a list of "key=value" pairs terminated with a flush packet. If the filter does not experience problems then the list must contain a "success" status. Right after these packets the filter is expected to send the content in zero or more pkt-line packets and a flush packet at the end. Finally, a second list of "key=value" pairs terminated with a flush packet is expected.
  
  With delays, the path is more important, since that is how a specific blob is requested, but the filter still obtains its contents and provides its updated contents in a way that does not involve reading and writing that path:
  
  After Git received the pathnames, it will request the corresponding blobs again. These requests contain a pathname and an empty content section. The filter is expected to respond with the smudged content in the usual way as explained above.
  
  It also seems to me that, if a filter does replace the file, then it should be responsible for doing so with the correct mode, and that its ability to do so would be enhanced rather than diminished by setting the intended permissions before the filter operates rather than after.
3. In case the filter changes the permissions to something wrong. But that is just a broken filter.
  
  More broadly, if a filter does change the mode, or even if the filter replaces the file, I am not sure we should try to countermand that even partially, since it could be attempting to smudge in filesystem metadata that Git cannot track but that is supported on the system. We could try to add +x on top of that, but it might be hard to do safely, since adding +x is not always safe.
  
  For example, as noted in the "Impact" section of GHSA-fqmf-w4xh-33rh, setuid and setgid bits were among those cleared when setting 777 permissions. Clearing those bits when +x is important, especially for setgid bits, which have a different meaning on some systems when no executable bits are set, pertaining to mandatory file locking (so they may be set without intending to cause anyone to be allowed to run the file with a changed effective group ID). To avoid setting 777 permissions except in the very rare case it is warranted, the fix for GHSA-fqmf-w4xh-33rh in #1764 does examine the existing mode and preserve parts of it, but it takes care still to clear those bits.
  
  That specific case of concern is covered by the existing logic, but others may not be. If a smudge filter changes file type, mode, or access through a mechanism other than mode-based permissions--such as by a mechanism setfacl or chattr--it seems to me that that's more of a reason to have performed any of our own chmod operations earlier, rather than later.
  
  (#1764 also adds logic to avoid +x in a TOCTOU typechange, which provides some protection against this in the case of replacing the file with an entry of a different type. But this is probably a slight further reason to set +x earlier so that, if the file is replaced with one of a different type that meaningfully supports +x permissions, they can be preserved.)

Does any scenario ever actually require waiting to set executable permissions? Currently we always do this unless destination_is_initially_empty: true. The reason seems to be to avoid changing permissions of a file that, if an error or cancellation occurs, may not have its contents replaced. But:

If that is to be avoided, then developers using gix-worktree-state should be cautioned in documentation about the current effect of keeping both destination_is_initially_empty and overwrite_existing at their default values of false in a checkout. This does not write to the file, but does then set its permissions. (This is mentioned in GHSA-fqmf-w4xh-33rh as relevant to impact, but it is not part of that vulnerability and #1764 does not change it.)
It seems like this could be avoided by treating a mode change more like a type change with respect to how a file is checked out, and instead of reusing the existing file, deleting (unlinking) it and replacing it with a new one, or doing so unless their contents are the same.

Examining git checkout --force with strace, it looks like it always does this, even if only metadata and not contents are changed. It looks like Git either primarily or exclusively modifies files in a force checkout by replacing them. It seems like that may solve various problems if done here, too. (Specific known cases where it makes sense to reuse a file could still be admitted, for example if the type and mode are the same and the link count is 1.)

If this is done as the primary strategy, with existing files not reused or only reused as an optimization in simple situations with no mode changes, then that would also eliminate the bit-manipulation from #1764 and instead always do something that is automatically correct because permissions are always set through open calls that automatically respect the umask. This would also arguably behave more the way a user wants, in the case that the user sets an unusual umask in which execute bits are masked differently from their corresponding read bits.

One aspect of the separation of writing to a file and setting its permissions is that, for some reason, we are setting the permissions using the path, even though we have an open file descriptor.

gitoxide/gix-worktree-state/src/checkout/entry.rs

Lines 285 to 292 in 8df0db2

    
           // For possibly existing, overwritten files, we must change the file mode explicitly. 
        
           #[cfg(unix)] 
        
           if let Some(path) = set_executable_after_creation { 
        
               let old_perm = std::fs::symlink_metadata(path)?.permissions(); 
        
               if let Some(new_perm) = set_mode_executable(old_perm) { 
        
                   std::fs::set_permissions(path, new_perm)?; 
        
               } 
        
           }

We have a File object, from which we can presumably perform any operation that is valid on an open file descriptor. The immediately subsequent code is:

gitoxide/gix-worktree-state/src/checkout/entry.rs

Lines 293 to 296 in 8df0db2

    
           // NOTE: we don't call `file.sync_all()` here knowing that some filesystems don't handle this well. 
        
           //       revisit this once there is a bug to fix. 
        
           entry.stat = Stat::from_fs(&gix_index::fs::Metadata::from_file(&file)?)?; 
        
           file.close()?;

File permissions on a Unix-like system can be set through an open file descriptor, without using the path. (If an OS is not Unix-like then we are not setting +x anyway.) POSIX standardizes fchmod for this purpose.

I believe gix_index::fs, used there, already depends on rustix where cfg!(unix). One way to call fchmod is through rustix::fs::fchmod. (It does not support file descriptors opened with the O_PATH flag, but I believe that is inapplicable here, because that's only useful when the file contents are not to be accessed in any way, whereas in contrast the file descriptor associated with our File object represents a file that is open in the full traditional sense and that has, presumably, just been written to.)

Besides the possibility that this would mitigate some remaining TOCTOU behavior if changed within the finalize_entry function itself, I wonder if this would make it more acceptable to set the executable permissions earlier if fchmod were used at the time the file is first opened instead. Of course, if we just replace the file in all but the most trivial situations, as described at the end of (ii), then this is inapplicable because we would never need to do anything like chmod or fchmod.

Git behavior

In the example with a, b, and c, with git clone, executable permissions are set on both b and c, rather than being omitted for b:

$ git ls-files -s -- a b c
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 0       a
100755 223b7836fb19fdf64ba2d3cd6173c6a283141f78 0       b
100755 3cc58df83752123644fef39faab2393af643b1d2 0       c
$ ls -lF a b c
-rw-r--r-- 1 ek ek 5 Jan 19 20:49 a
-rwxr-xr-x 1 ek ek 5 Jan 19 20:49 b*
-rwxr-xr-x 1 ek ek 2 Jan 19 20:49 c*

This is the expected behavior. It also matches the behavior of gix clone when the preconditions for the bug, listed at the top of the "Current behavior" section above, do not apply.

As noted above in "Expected behavior" in 5(ii), Git makes greater use of deleting and replacing files.

Steps to reproduce 🕹

I used the arrow.rs filter example in gix-filter, invoked as a long-running filter, as the smudge filter for producing the bug in gix clone as well as for verifying that it does not occur in git clone. The test setup can be obtained from this filter-experiment repository. The readme there includes instructions for running the experiment after cloning it, as well as more details on some aspects of the experiment.

However, I think it valuable for this issue to be self-contained. Furthermore, the script itself is descriptively valuable for clarifying the circumstances under which the bug occurs, even if one does not run it.

Setup

If not cloning filter-experiment, first create a directory alongside (i.e. as a sibling of) the gitoxide repository working tree. Enter the new directory and create an executable file run-experiment with the contents:

#!/bin/sh
set -eu
gitoxide_repo='../gitoxide'  # For the `arrow` example filter.

# Read command-line arguments.
gitlike_cmd="${1:?pass the command to test (git or gix) as an argument}"
shift

# Stop if repositories are left over from a previous run.
if test -e remote-repo || test -e local-clone; then
    printf 'Found repos from previous experiment, stopping.\n' >&2
    exit 2
fi

# Build the arrow example and symlink to it, if necessary.
(cd -- "$gitoxide_repo" && cargo build -p gix-filter --example arrow)
test -e arrow || ln -s -- "$gitoxide_repo/target/debug/examples/arrow"

# Report the experiment.
printf 'Testing: %s\n' "$gitlike_cmd $*"
set -x

# Compose the command that will run the arrow example using an absolute path.
# (git and gix treat relative paths as relative to different locations, here.)
sq="'"
bs="\\"
filter_path="$(realpath arrow)"
filter_cmd="'$(printf '%s\n' "$filter_path" | sed "s/$sq/$sq$bs$bs$sq$sq/g")' process"

# Create the remote repository.
git init -q remote-repo
echo '[ab] filter=arrow-example' >remote-repo/.gitattributes
echo A >remote-repo/a
echo B >remote-repo/b
echo C >remote-repo/c
chmod +x remote-repo/[bc]
git -C remote-repo add .
git -C remote-repo commit -m 'Initial commit'

# Clone it and show the clone.
"$gitlike_cmd" "$@" -c filter.arrow-example.process="$filter_cmd" \
    clone "file://$(realpath remote-repo)" local-clone
ls --color -AlF local-clone

Between separate runs of that script, rm -rf remote-repo local-clone should be run to remove the repositories it creates and uses for the test. (It will not proceed if they are already present.)

Results

Running ./run-experiment git tests git clone, showing that both b and c are marked executable. This demonstrates the Git behavior, and is the expected behavior in the absence of a bug.

Running ./run-experiment gix tests gix clone, showing that only c and not b is marked executable. This demonstrates the current behavior.

I did these tests on Arch Linux, using QuickInstall builds of:

gitoxide 0.40.0, with gix-worktree-state 0.16.0, from before GHSA-fqmf-w4xh-33rh was fixed, and
gitoxide 0.41.0, with gix-worktree-state 0.17.0, which has the #1764 fix for GHSA-fqmf-w4xh-33rh.

gix clone showed the same results with both. (As detailed above in "Current behavior," under "Relationship to CVE-2025-22620," I would not expect them to be different.)

It doesn't happen without delays

The arrow.rs filter enables delays, and reports can-delay to the caller, unless disallow-delay is passed. To confirm that a long running smudge filter without delays will not produce the bug, I redid the experiment with this change to the run-experiment script:

-filter_cmd="'$(printf '%s\n' "$filter_path" | sed "s/$sq/$sq$bs$bs$sq$sq/g")' process"
+filter_cmd="'$(printf '%s\n' "$filter_path" | sed "s/$sq/$sq$bs$bs$sq$sq/g")' process disallow-delay"

The bug no longer occurred then. gix clone produced the same result as git clone.

It does happen with `required`

As described above, the filter.<driver>.required was not used, even though, as described in #1781, the gix-filter tests that use arrow.rs do set required to true. This was originally because I wasn't familiar with required, even though both Git and gitoxide recognize it.

To confirm that this does not change the result of the experiment, I also reran the experiment with this change to the run-experiment script (though the effect could also be achieved by passing -c filter.arrow-example.required=true as arguments to ./run-experiment):

 "$gitlike_cmd" "$@" -c filter.arrow-example.process="$filter_cmd" \
+    -c filter.arrow-example.required=true \
     clone "file://$(realpath remote-repo)" local-clone

As expected, this does not make a difference. To be clear, that does not prove it never makes a difference, since I don't think the filter ever actually failed. (That is, this does not explore what happens when a smudge filter fails, it only shows that neither required nor its absence are preconditions for the bug.)

Does the "schedule" matter?

The experiment uses few files: four, including .gitattributes; but only two to which the filter is applied. Delay behavior might differ with more files, or other different conditions. I suspect that does not make a difference, but I don't know it doesn't, and the experiment here doesn't speak to that.

How much do `gitoxide-core` specifics matter?

Because this is only testing gitoxide through gix clone commands, it does not distinguish behavior that is due to the details of how code in gitoxide-core calls gix_worktree_state::checkout from behavior that is due to code in gix_worktree_state itself.

The text was updated successfully, but these errors were encountered:

Byron · 2025-01-22T06:30:25Z

Thanks so much for gathering everything there is to know about this issue!
I'd hope that one of the existing tests for delayed process filters can be used to also validate executable bit addition (and removal) at some point.

Each of those elements is required, to trigger the bug. If the attribute is not applied to a particular file (even if it is applied to other files and the smudge filter is configured), or if the smudge filter is not configured (regardless of attributes), or if the smudge filter is not a process filter, or if the smudge filter is a process filter but it does not support delays, then the file that should be executable is still checked out as executable.

This is when I knew you actually tested all this! Thanks to you I know I can't think of myself as somebody who works thoroughly anymore 😁.

Byron · 2025-01-22T06:38:13Z

Regarding possible fixes, and knowing that I merely skimmed the issue after the first paragraph to not spend my time budget on this issue alone, I can share that I agree there should be no need for delayed application (or removal) of the executable bit. The current logic isn't particularly well planned and could be fundamentally changed to better suit the needs, particularly when also dealing with executable it removal.

This was referenced Jan 20, 2025

Nonexclusive checkout never removes executable permissions #1784

Open

finalize_entry uses path and FD that may diverge, could use just FD #1786

Open

overwrite_existing: false is not documented to allow +x be set #1787

Open

Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed process filters suppress +x in `gix clone` #1783

Delayed process filters suppress +x in `gix clone` #1783

EliahKagan commented Jan 20, 2025

Byron commented Jan 22, 2025

Byron commented Jan 22, 2025

Delayed process filters suppress +x in gix clone #1783

Delayed process filters suppress +x in gix clone #1783

Comments

EliahKagan commented Jan 20, 2025

Current behavior 😯

Relationship to CVE-2025-22620

What seems to be happening

Concrete example

Expected behavior 🤔

Of the concrete example

Design considerations

Git behavior

Steps to reproduce 🕹

Setup

Results

It doesn't happen without delays

It does happen with required

Does the "schedule" matter?

How much do gitoxide-core specifics matter?

Byron commented Jan 22, 2025

Byron commented Jan 22, 2025

Delayed process filters suppress +x in `gix clone` #1783

Delayed process filters suppress +x in `gix clone` #1783

It does happen with `required`

How much do `gitoxide-core` specifics matter?