From 58855c4d3455823d75d044d72122d317eb7d87af Mon Sep 17 00:00:00 2001 From: Eliah Kagan Date: Thu, 28 Nov 2024 09:18:15 -0500 Subject: [PATCH] Fix `check-mode.sh` display of leading-whitespace paths In #1708, `check-mode.sh` deliberately parsed `git ls-files -sz` records using default nonempty `IFS`, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with `.sh` suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace. (We do not intentionally have such files in the repository, but they can arise accidentally, and they might also, in principle, be added at some point to support tests.) This was *sort of* okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons: - The paths were at least in principle ambiguous. - A path with accidental leading whitespace would not be noticed. - The `path` variable looks like it would be usable, so it could be easily misused in future changes, if not fixed. - Having to think through whether or not this parsing bug is excessively serious, when reasoning about the code, may make the code more cumbersome to understand and maintain. This is to say that I should never have done it in that way. Therefore, this replaces the default-`IFS` logic with empty-`IFS` calls to `read` and regular-expression processing, using `bash`'s `[[` keyword, which supports a `=~` operator that matches against a regular expression and populates the `BASH_REMATCH` array variable with captures. (This approach to parsing `git ls-file -sz` is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.) As discussed in #1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects reocognize POSIX classes like `[[:digit:]]`.) This commit includes some files to test the fix in the script. They are at the top level of the repository, since the bug this fixes only occurs with whitespace at the beginning of the first path component (relative to the root of the working tree). These files will be removed shortly. The messages for them are: mode +x but no shebang: $'\tb.sh' mode -x but has shebang: \ a.sh Before the fix, they would have been shown as `a.sh` instead of `\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach to quoting has not itself changed: the `%q` format specifier for the `printf` builtin in `bash` formats them this, when the paths are parsed in such a way that the characters that would need quoting if used in the shell are not lost. --- "\tb.sh" | 0 a.sh | 1 + etc/check-mode.sh | 13 ++++++++++--- 3 files changed, 11 insertions(+), 3 deletions(-) create mode 100755 "\tb.sh" create mode 100644 a.sh diff --git "a/\tb.sh" "b/\tb.sh" new file mode 100755 index 00000000000..e69de29bb2d diff --git a/ a.sh b/ a.sh new file mode 100644 index 00000000000..f1f641af19b --- /dev/null +++ b/ a.sh @@ -0,0 +1 @@ +#!/usr/bin/env bash diff --git a/etc/check-mode.sh b/etc/check-mode.sh index 4bc15b79ba5..44ac3bc4bbf 100755 --- a/etc/check-mode.sh +++ b/etc/check-mode.sh @@ -10,7 +10,7 @@ cd -- "$root" symbolic_shebang="$(printf '#!' | od -An -ta)" status=0 -function check () { +function check_item () { local mode="$1" oid="$2" path="$3" symbolic_magic # Extract the first two bytes (or less if shorter) and put in symbolic form. @@ -28,11 +28,18 @@ function check () { status=1 } +readonly record_pattern='^([0-7]+) ([[:xdigit:]]+) [[:digit:]]+'$'\t''(.+)$' + # Check regular files named with a `.sh` suffix. -while read -rd '' mode oid _stage_number path; do +while IFS= read -rd '' record; do + [[ $record =~ $record_pattern ]] + mode="${BASH_REMATCH[1]}" + oid="${BASH_REMATCH[2]}" + path="${BASH_REMATCH[3]}" + case "$mode" in 100644 | 100755) - check "$mode" "$oid" "$path" + check_item "${BASH_REMATCH[@]:1:3}" ;; esac done < <(git ls-files -sz -- '*.sh')