Skip to content

Commit

Permalink
Fix check-mode.sh display of leading-whitespace paths
Browse files Browse the repository at this point in the history
In GitoxideLabs#1708, `check-mode.sh` deliberately parsed `git ls-files -sz`
records using default nonempty `IFS`, to conveniently read fields
into variables. But this would strip leading and trailing
whitespace from the paths. Trailing whitespace is not currently
present, since only files with `.sh` suffixes are processed, so the
only paths that would be displayed incorrectly were those with
leading whitespace. (We do not intentionally have such files in the
repository, but they can arise accidentally, and they might also,
in principle, be added at some point to support tests.)

This was *sort of* okay, because the script never accesses the
repository or filesystem using the paths, but only displays them to
the user. But this was still bad for a few reasons:

- The paths were at least in principle ambiguous.

- A path with accidental leading whitespace would not be noticed.

- The `path` variable looks like it would be usable, so it could be
  easily misused in future changes, if not fixed.

- Having to think through whether or not this parsing bug is
  excessively serious, when reasoning about the code, may make the
  code more cumbersome to understand and maintain.

This is to say that I should never have done it in that way.

Therefore, this replaces the default-`IFS` logic with empty-`IFS`
calls to `read` and regular-expression processing, using `bash`'s
`[[` keyword, which supports a `=~` operator that matches against a
regular expression and populates the `BASH_REMATCH` array variable
with captures.

(This approach to parsing `git ls-file -sz` is based on code I wrote
in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.)

As discussed in GitoxideLabs#1708, this script may be converted to Rust. This
change is not intended as a substitute for that. It should actually
make that slightly easier, in that a regex approach has clearer
semantics, whether it is translated into code that uses regex or
not. (If regex are used, it will probably still need adjustment,
as not all dialects reocognize POSIX classes like `[[:digit:]]`.)

This commit includes some files to test the fix in the script. They
are at the top level of the repository, since the bug this fixes
only occurs with whitespace at the beginning of the first path
component (relative to the root of the working tree). These files
will be removed shortly. The messages for them are:

    mode +x but no shebang: $'\tb.sh'
    mode -x but has shebang: \ a.sh

Before the fix, they would have been shown as `a.sh` instead of
`\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach
to quoting has not itself changed: the `%q` format specifier for
the `printf` builtin in `bash` formats them this, when the paths
are parsed in such a way that the characters that would need
quoting if used in the shell are not lost.
  • Loading branch information
EliahKagan committed Nov 28, 2024
1 parent 34efe03 commit 58855c4
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 3 deletions.
Empty file added b.sh
Empty file.
1 change: 1 addition & 0 deletions a.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#!/usr/bin/env bash
13 changes: 10 additions & 3 deletions etc/check-mode.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ cd -- "$root"
symbolic_shebang="$(printf '#!' | od -An -ta)"
status=0

function check () {
function check_item () {
local mode="$1" oid="$2" path="$3" symbolic_magic

# Extract the first two bytes (or less if shorter) and put in symbolic form.
Expand All @@ -28,11 +28,18 @@ function check () {
status=1
}

readonly record_pattern='^([0-7]+) ([[:xdigit:]]+) [[:digit:]]+'$'\t''(.+)$'

# Check regular files named with a `.sh` suffix.
while read -rd '' mode oid _stage_number path; do
while IFS= read -rd '' record; do
[[ $record =~ $record_pattern ]]
mode="${BASH_REMATCH[1]}"
oid="${BASH_REMATCH[2]}"
path="${BASH_REMATCH[3]}"

case "$mode" in
100644 | 100755)
check "$mode" "$oid" "$path"
check_item "${BASH_REMATCH[@]:1:3}"
;;
esac
done < <(git ls-files -sz -- '*.sh')
Expand Down

0 comments on commit 58855c4

Please sign in to comment.