Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Index: Integrate with 'git add' #999

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions builtin/add.c
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,6 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
{
int i, retval = 0;

/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];

Expand Down Expand Up @@ -192,13 +190,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
> entries, 2021-04-08), 'git add --refresh <path>' will output a warning
> message when the path is outside the sparse-checkout definition. The
> implementation of this warning happened in parallel with the
> sparse-index work to add ensure_full_index() calls throughout the
> codebase.
>
> Update this loop to have the proper logic that checks to see if the
> pathspec is outside the sparse-checkout definition. This avoids the need
> to expand the sparse directory entry and determine if the path is
> tracked, untracked, or ignored. We simply avoid updating the stat()
> information because there isn't even an entry that matches the path!
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  builtin/add.c                            | 10 +++++++++-
>  t/t1092-sparse-checkout-compatibility.sh |  6 +-----
>  2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index c76e6ddd359..d512ece655b 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
>         struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
>         int flags = REFRESH_IGNORE_SKIP_WORKTREE |
>                     (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
> +       struct pattern_list pl = { 0 };
> +       int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
>
>         seen = xcalloc(pathspec->nr, 1);
>         refresh_index(&the_index, flags, pathspec, seen,
>                       _("Unstaged changes after refreshing the index:"));
>         for (i = 0; i < pathspec->nr; i++) {
>                 if (!seen[i]) {
> -                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> +                       const char *path = pathspec->items[i].original;
> +                       int dtype = DT_REG;
> +
> +                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
> +                           (sparse_checkout_enabled &&
> +                            !path_matches_pattern_list(path, strlen(path), NULL,
> +                                                       &dtype, &pl, &the_index))) {

I was slightly worried from the description in the commit message
about the case where you have a file without the SKIP_WORKTREE bit set
despite not matching sparsity paths.  I was worried that you'd skip
refreshing it, but I tweaked your testcases and couldn't trigger it.

>                                 string_list_append(&only_match_skip_worktree,
>                                                    pathspec->items[i].original);
>                         } else {
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 73c48a71d89..c61424e2074 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -347,7 +347,7 @@ test_expect_success 'status/add: outside sparse cone' '
>         test_all_match git commit -m folder1/newer
>  '
>
> -test_expect_failure 'add: pathspec within sparse directory' '
> +test_expect_success 'add: pathspec within sparse directory' '
>         init_repos &&
>
>         run_on_sparse mkdir folder1 &&
> @@ -357,10 +357,6 @@ test_expect_failure 'add: pathspec within sparse directory' '
>         # This "git add folder1/a" fails with a warning
>         # in the sparse repos, differing from the full
>         # repo. This is intentional.
> -       #
> -       # However, in the sparse-index, folder1/a does not
> -       # match any cache entry and fails with a different
> -       # error message. This needs work.
>         test_sparse_match test_must_fail git add folder1/a &&
>         test_sparse_match test_must_fail git add --refresh folder1/a &&
>         test_all_match git status --porcelain=v2
> --
> gitgitgadget

This and Patch 4/5 look good to me.

int flags = REFRESH_IGNORE_SKIP_WORKTREE |
(verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
struct pattern_list pl = { 0 };
int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);

seen = xcalloc(pathspec->nr, 1);
refresh_index(&the_index, flags, pathspec, seen,
_("Unstaged changes after refreshing the index:"));
for (i = 0; i < pathspec->nr; i++) {
if (!seen[i]) {
if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
const char *path = pathspec->items[i].original;
int dtype = DT_REG;

if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
(sparse_checkout_enabled &&
!path_matches_pattern_list(path, strlen(path), NULL,
&dtype, &pl, &the_index))) {
string_list_append(&only_match_skip_worktree,
pathspec->items[i].original);
} else {
Expand Down Expand Up @@ -528,6 +534,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

OK.  With that explained, it still is quite surprising that we only
need this change (eh, rather, doing this change is safe without
doing anything else).

> -	# This "git add folder1/a" fails with a warning
> -	# in the sparse repos, differing from the full
> -	# repo. This is intentional.
> -	test_sparse_match test_must_fail git add folder1/a &&
> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
> -	test_all_match git status --porcelain=v2 &&

And nice to see a known limitation lifted.

>  	test_all_match git add . &&
>  	test_all_match git status --porcelain=v2 &&
>  	test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>  	git -C sparse-index reset --hard &&
>  	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>  	git -C sparse-index reset --hard &&
> -	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +	echo >>sparse-index/README.md &&
> +	ensure_not_expanded add -A &&
> +	echo >>sparse-index/extra.txt &&
> +	ensure_not_expanded add extra.txt
>  '
>  
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/21/21 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> 
>> From: Derrick Stolee <[email protected]>
>>
>> Disable command_requires_full_index for 'git add'. This does not require
>> any additional removals of ensure_full_index(). The main reason is that
>> 'git add' discovers changes based on the pathspec and the worktree
>> itself. These are then inserted into the index directly, and calls to
>> index_name_pos() or index_file_exists() already call expand_to_path() at
>> the appropriate time to support a sparse-index.
> 
> OK.  With that explained, it still is quite surprising that we only
> need this change (eh, rather, doing this change is safe without
> doing anything else).

Yes, all of the hard work was done by the earlier work to expand
a sparse index when we search for a specific path that lands
within a sparse directory. See 95e0321 (read-cache: expand on query
into sparse-directory entry, 2021-04-01) for the specifics.

>> -	# This "git add folder1/a" fails with a warning
>> -	# in the sparse repos, differing from the full
>> -	# repo. This is intentional.
>> -	test_sparse_match test_must_fail git add folder1/a &&
>> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
>> -	test_all_match git status --porcelain=v2 &&
> 
> And nice to see a known limitation lifted.

Thank you for pointing this out. This actually starts to _fail_ now
that we allow sparse indexes in 'git add', but it's because the error
messages don't match, not that the 'test_must_fail' is violated.

Patch 4 adds a similar test that is then set to work in patch 5. That
allows us a clear way to describe the behavior change and to motivate
the fix in patch 5. This could be explained better, perhaps by merging
Patch 4 into this one. That helps describe how this specific case
changes behavior (for the worse) in this patch, but is handled in a
careful way later, once the behavior change is documented.

If there is a better way to reorganize these patches, then I could
try another approach.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

Nice.

> Add a test to check that 'git add -A' and 'git add <file>' does not
> expand the index at all, as long as <file> is not within a sparse
> directory. This does not help the global 'git add .' case.

Good idea.

> We can measure the improvement using p2000-sparse-operations.sh with
> these results:
>
> Test                                  HEAD~1           HEAD
> ------------------------------------------------------------------------------
> 2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
> 2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
> 2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
> 2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%
>
> While the 91% improvement seems impressive, it's important to recognize
> that previously we had significant overhead for expanding the
> sparse-index. Comparing to the full index case, 'git add -A' goes from
> 0.37s to 0.05s, which is "only" an 86% improvement.

Hehe.  Yep, it's so "disappointing" to "only" have the code be 7x faster.  :-)

Out of curiosity, IIRC any operation involving the index took ~10s on
some of the Microsoft repos.  What does the speedup look like over
there for these changes to git-add?

>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  builtin/add.c                            |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 14 ++++++--------
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index b773b5a4993..c76e6ddd359 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>         add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
>         require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
>
>         /*
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a3c01d588d8..a11d9d7f35d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
>
>         test_sparse_match git status --porcelain=v2 &&
>
> -       # This "git add folder1/a" fails with a warning
> -       # in the sparse repos, differing from the full
> -       # repo. This is intentional.
> -       test_sparse_match test_must_fail git add folder1/a &&
> -       test_sparse_match test_must_fail git add --refresh folder1/a &&
> -       test_all_match git status --porcelain=v2 &&
> -

Why was this chunk removed?  Nothing in the commit message mentions
this, and it's not clear to me the reason for it.

I tried adding it back in at the end of the series and it still works
(and further I can't change test_sparse_match to test_all_match and
have the test work).

>         test_all_match git add . &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>         git -C sparse-index reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>         git -C sparse-index reset --hard &&
> -       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +       echo >>sparse-index/README.md &&
> +       ensure_not_expanded add -A &&
> +       echo >>sparse-index/extra.txt &&
> +       ensure_not_expanded add extra.txt

...and here's the extra test you mentioned in the commit message.  Looks good.

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2021 1:45 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
...
>> Test                                  HEAD~1           HEAD
>> ------------------------------------------------------------------------------
>> 2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
>> 2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
>> 2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
>> 2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%
>>
>> While the 91% improvement seems impressive, it's important to recognize
>> that previously we had significant overhead for expanding the
>> sparse-index. Comparing to the full index case, 'git add -A' goes from
>> 0.37s to 0.05s, which is "only" an 86% improvement.
> 
> Hehe.  Yep, it's so "disappointing" to "only" have the code be 7x faster.  :-)
> 
> Out of curiosity, IIRC any operation involving the index took ~10s on
> some of the Microsoft repos.  What does the speedup look like over
> there for these changes to git-add?

The latest numbers I have for a repo with ~2 million tracked files is that
 index reads take about half a second (because of the threaded reads) and
writes take at least one second. There was a lot of work by Ben Peart, Jeff
Hostetler, and Kevin Willford to reduce this cost as much as possible a few
years ago. VFS for Git is still limited by this bottleneck, but Scalar's
use of sparse-checkout enables the use of the sparse index.

We have an experimental release [1] out to users right now, and I will
report to the mailing list about how that went after we get sufficient
adoption that the data can be significant. When focusing on individual
users I can find things like one user seeing "git commit" going from 4.3s
to 0.35s and "git add" going from 6.1s to 0.13s. (The "git add" time
might also be conflated with a change from the FS Monitor hook to the
builtin FS Monitor.)

[1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2021 1:45 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>>
...
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index a3c01d588d8..a11d9d7f35d 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
>>
>>         test_sparse_match git status --porcelain=v2 &&
>>
>> -       # This "git add folder1/a" fails with a warning
>> -       # in the sparse repos, differing from the full
>> -       # repo. This is intentional.
>> -       test_sparse_match test_must_fail git add folder1/a &&
>> -       test_sparse_match test_must_fail git add --refresh folder1/a &&
>> -       test_all_match git status --porcelain=v2 &&
>> -
> 
> Why was this chunk removed?  Nothing in the commit message mentions
> this, and it's not clear to me the reason for it.
> 
> I tried adding it back in at the end of the series and it still works
> (and further I can't change test_sparse_match to test_all_match and
> have the test work).

I mentioned this in a reply to Junio, but this hunk removal is confusing.

As of this patch, this hunk causes a failure due to an error message not
matching, specifically this error:


+ diff -u sparse-checkout-err sparse-index-err
--- sparse-checkout-err 2021-07-26 13:30:50.304291264 +0000
+++ sparse-index-err    2021-07-26 13:30:50.308291259 +0000
@@ -1,5 +1 @@
-The following pathspecs didn't match any eligible path, but they do match index
-entries outside the current sparse checkout:
-folder1/a
-hint: Disable or modify the sparsity rules if you intend to update such entries.
-hint: Disable this message with "git config advice.updateSparsePath false"
+fatal: pathspec 'folder1/a' did not match any files


A similar test is added as a failure case in patch 4, then marked as success
in patch 5. This organization of test changes could be organized better, so
I will work on that in v2, along with your other suggestions.

Thanks,
-Stolee

require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);

/*
Expand Down
2 changes: 0 additions & 2 deletions pathspec.c
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
num_unmatched++;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
> 2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
> 2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
> 2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%
>
> While the ~90% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 0.33s to 0.05s, an 85% improvement.

These perf improvements are pretty sweet.

> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 7 +++----
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 08f8d3eedc3..44306fdaca2 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a11d9d7f35d..f9e2f5f4aa1 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
>  test_expect_success 'status/add: outside sparse cone' '
>         init_repos &&
>
> -       # adding a "missing" file outside the cone should fail
> -       test_sparse_match test_must_fail git add folder1/a &&
> -

So this is removed because of non-matching errors.  In particular,
sparse-checkout shows
"""
The following pathspecs didn't match any eligible path, but they do match index
entries outside the current sparse checkout:
folder1/a
hint: Disable or modify the sparsity rules if you intend to update such entries.
hint: Disable this message with "git config advice.updateSparsePath false"
"""
while sparse-index now shows:
"""
fatal: pathspec 'folder1/a' did not match any files
"""

The new error seems entirely reasonable to me.  No objection here.


But allow me to go on a bit of a diversion...

If we modify this setup slightly by running:

$ mkdir folder1
$ echo garbage >folder1/a
$ git add folder1/a

Then you'll get the first of those errors in both the sparse-index and
the sparse-checkout.  I also like this behavior.

If you unset the SKIP_WORKTREE bit manually, and then add:

$ git update-index --no-skip-worktree folder1/a
$ git add folder1/a

Then the file is added with no error or warning.  I like this behavior too.

If you further change the setup with:

$ echo more garbage >folder1/z
$ git add folder1/z

Then you get no error, despite folder1/z being an untracked file
outside of sparsity paths.  No bueno.  :-(

>         # folder1 is at HEAD, but outside the sparse cone
>         run_on_sparse mkdir folder1 &&
>         cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> @@ -633,7 +630,9 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/README.md &&
>         ensure_not_expanded add -A &&
>         echo >>sparse-index/extra.txt &&
> -       ensure_not_expanded add extra.txt
> +       ensure_not_expanded add extra.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       ensure_not_expanded add .

:-)

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --

So I added a lot of comments here, in part because I thought I'd test
a bit more of what I said in response to your cover letter and see how
close to it we were.  The patch in question looks fine.

I just added an aside as a convenient place to check whether the
behavior at the end of the series matches what you proposed in the
cover letter, or what I proposed in response.  It appears it matches
neither (though that's not due to this specific patch).

if (!num_unmatched)
return;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
Expand Down
67 changes: 58 additions & 9 deletions t/t1092-sparse-checkout-compatibility.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,16 @@ test_expect_success 'setup' '
git add . &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 37 ++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 91e30d6ec22..a3c01d588d8 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -114,6 +114,16 @@ test_expect_success 'setup' '
>                 git add . &&
>                 git commit -m "file to dir" &&
>
> +               for side in left right
> +               do
> +                       git checkout -b merge-$side base &&
> +                       echo $side >>deep/deeper2/a &&
> +                       echo $side >>folder1/a &&
> +                       echo $side >>folder2/a &&
> +                       git add . &&
> +                       git commit -m "$side" || return 1

Why is this "|| return 1" here?

It looks like there are a number of other cases of this in the file
too, which I must have overlooked previously, because I don't
understand any of them.

> +               done &&
> +
>                 git checkout -b deepest base &&
>                 echo "updated deepest" >deep/deeper1/deepest/a &&
>                 git commit -a -m "update deepest" &&
> @@ -482,6 +492,33 @@ test_expect_success 'merge' '
>         test_all_match git rev-parse HEAD^{tree}
>  '
>
> +test_expect_success 'merge with conflict outside cone' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b merge-tip merge-left &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match test_must_fail git merge -m merge merge-right &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # resolve the conflict in different ways:
> +       # 1. revert to the base
> +       test_all_match git checkout base -- deep/deeper2/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 2. add the file with conflict markers
> +       test_all_match git add folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 3. rename the file to another sparse filename

But...that doesn't resolve the conflict.  Shouldn't this be titled
"accept the conflict & rename the file elsewhere"?

> +       run_on_all mv folder2/a folder2/z &&
> +       test_all_match git add folder2 &&

'mv' rather than 'git mv', then followed by 'git add'?  Any reason for
this order rather than git add followed by git mv?

Also, if you really do want to move first, did you use mv instead of
"git mv" due to the latter's shortcoming of only operating on stage 0?
(https://lore.kernel.org/git/CABPp-BGJdwpwhQUp4Wa4bKBp4hQFB9OM3N1FXH7SzY0mvLDa7Q@mail.gmail.com/)

Regardless of order, though, I still think mv or add should require a
--force to rename or add a file outside the sparsity paths given the
deferred negative surprises for users around such files.  (Or come up
with a solid way to remove those surprises.)

> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git merge --continue &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git rev-parse HEAD^{tree}
> +'
> +
>  test_expect_success 'merge with outside renames' '
>         init_repos &&

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <[email protected]> wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
> > +               for side in left right
> > +               do
> > +                       git checkout -b merge-$side base &&
> > +                       echo $side >>deep/deeper2/a &&
> > +                       echo $side >>folder1/a &&
> > +                       echo $side >>folder2/a &&
> > +                       git add . &&
> > +                       git commit -m "$side" || return 1
>
> Why is this "|| return 1" here?
>
> It looks like there are a number of other cases of this in the file
> too, which I must have overlooked previously, because I don't
> understand any of them.

A shell for-loop won't automatically terminate just because some
command in its body fails. Instead it will run to completion and
return the status of the last command of the last iteration, which may
not be the iteration which failed, thus a failure can be hidden.
Therefore, we need to proactively stop the loop iteration _and_ ensure
that the return status of the loop itself reflects the failure, which
we do by `|| return 1`. (If this loop was inside a subshell, we'd use
`|| exit 1` instead.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Jul 23, 2021 at 10:44 AM Eric Sunshine <[email protected]> wrote:
>
> On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <[email protected]> wrote:
> > On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> > <[email protected]> wrote:
> > > +               for side in left right
> > > +               do
> > > +                       git checkout -b merge-$side base &&
> > > +                       echo $side >>deep/deeper2/a &&
> > > +                       echo $side >>folder1/a &&
> > > +                       echo $side >>folder2/a &&
> > > +                       git add . &&
> > > +                       git commit -m "$side" || return 1
> >
> > Why is this "|| return 1" here?
> >
> > It looks like there are a number of other cases of this in the file
> > too, which I must have overlooked previously, because I don't
> > understand any of them.
>
> A shell for-loop won't automatically terminate just because some
> command in its body fails. Instead it will run to completion and
> return the status of the last command of the last iteration, which may
> not be the iteration which failed, thus a failure can be hidden.
> Therefore, we need to proactively stop the loop iteration _and_ ensure
> that the return status of the loop itself reflects the failure, which
> we do by `|| return 1`. (If this loop was inside a subshell, we'd use
> `|| exit 1` instead.)

Ah, thanks for the explanation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2021 1:34 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>>
>> From: Derrick Stolee <[email protected]>

...

>> +       # 3. rename the file to another sparse filename
> 
> But...that doesn't resolve the conflict.  Shouldn't this be titled
> "accept the conflict & rename the file elsewhere"?

Sure. I'm less focused on the content of the file and more the steps
the user might have taken to resolve the conflict.
 
>> +       run_on_all mv folder2/a folder2/z &&
>> +       test_all_match git add folder2 &&
> 
> 'mv' rather than 'git mv', then followed by 'git add'?  Any reason for
> this order rather than git add followed by git mv?

I'm trying to mimic that a user might realize that a filename might
need to be renamed (say, because a naming convention changed that is
causing the conflict) and I don't expect users to use 'git mv' to do
this action.

> Also, if you really do want to move first, did you use mv instead of
> "git mv" due to the latter's shortcoming of only operating on stage 0?
> (https://lore.kernel.org/git/CABPp-BGJdwpwhQUp4Wa4bKBp4hQFB9OM3N1FXH7SzY0mvLDa7Q@mail.gmail.com/)

'git mv' had not occurred to me as a thing to do in this case. I'm
focused on ensuring that 'git add' works as expected to update the
index in response to filesystem changes.

> Regardless of order, though, I still think mv or add should require a
> --force to rename or add a file outside the sparsity paths given the
> deferred negative surprises for users around such files.  (Or come up
> with a solid way to remove those surprises.)

--force is focused on _ignored_ files. I imagine that it could be
repurposed to allow entries outside of the sparse-checkout definition,
but we would want to be careful for users who are adding the entire
directory, not just the individual files, as they might _also_ get any
ignored files that exist in that directory. That might justify creating
a new option instead.

Further, the error message reported when adding something outside of
the sparse cone should probably mention whatever option exists as a
way for users to bypass this limitation. I'll collect my thoughts (in
response to your detailed thoughts shared on my cover letter) and
start a new thread about hardening this behavior. I've got an internal
ticket tracking this, and I want to wrap my head around all of the
interesting commands (add, mv, rm, update-index?) and create a full
recommendation to bring as an RFC.

Of course, if someone else wants to create this clear vision in the
meantime I will not complain.

Thanks,
-Stolee

git commit -m "file to dir" &&

for side in left right
do
git checkout -b merge-$side base &&
echo $side >>deep/deeper2/a &&
echo $side >>folder1/a &&
echo $side >>folder2/a &&
git add . &&
git commit -m "$side" || return 1
done &&

git checkout -b deepest base &&
echo "updated deepest" >deep/deeper1/deepest/a &&
git commit -a -m "update deepest" &&
Expand Down Expand Up @@ -312,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
test_expect_success 'status/add: outside sparse cone' '
init_repos &&

# adding a "missing" file outside the cone should fail
test_sparse_match test_must_fail git add folder1/a &&

# folder1 is at HEAD, but outside the sparse cone
run_on_sparse mkdir folder1 &&
cp initial-repo/folder1/a sparse-checkout/folder1/a &&
Expand All @@ -330,21 +337,23 @@ test_expect_success 'status/add: outside sparse cone' '

test_sparse_match git status --porcelain=v2 &&

# This "git add folder1/a" fails with a warning
# in the sparse repos, differing from the full
# repo. This is intentional.
# Adding the path outside of the sparse-checkout cone should fail.
test_sparse_match test_must_fail git add folder1/a &&
test_sparse_match test_must_fail git add --refresh folder1/a &&
test_all_match git status --porcelain=v2 &&

# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
test_sparse_match git add folder1/new &&

test_all_match git add . &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m folder1/new &&
test_all_match git rev-parse HEAD^{tree} &&

run_on_all ../edit-contents folder1/newer &&
test_all_match git add folder1/ &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m folder1/newer
test_all_match git commit -m folder1/newer &&
test_all_match git rev-parse HEAD^{tree}
'

test_expect_success 'checkout and reset --hard' '
Expand Down Expand Up @@ -482,6 +491,39 @@ test_expect_success 'merge' '
test_all_match git rev-parse HEAD^{tree}
'

# NEEDSWORK: This test is documenting current behavior, but that
# behavior can be confusing to users so there is desire to change it.
# Right now, users might be using this flow to work through conflicts,
# so any solution should present advice to users who try this sequence
# of commands to follow whatever new method we create.
test_expect_success 'merge with conflict outside cone' '
init_repos &&

test_all_match git checkout -b merge-tip merge-left &&
test_all_match git status --porcelain=v2 &&
test_all_match test_must_fail git merge -m merge merge-right &&
test_all_match git status --porcelain=v2 &&

# Resolve the conflict in different ways:
# 1. Revert to the base
test_all_match git checkout base -- deep/deeper2/a &&
test_all_match git status --porcelain=v2 &&

# 2. Add the file with conflict markers
test_all_match git add folder1/a &&
test_all_match git status --porcelain=v2 &&

# 3. Rename the file to another sparse filename and
# accept conflict markers as resolved content.
run_on_all mv folder2/a folder2/z &&
test_all_match git add folder2 &&
test_all_match git status --porcelain=v2 &&

test_all_match git merge --continue &&
test_all_match git status --porcelain=v2 &&
test_all_match git rev-parse HEAD^{tree}
'

test_expect_success 'merge with outside renames' '
init_repos &&

Expand Down Expand Up @@ -598,7 +640,14 @@ test_expect_success 'sparse-index is not expanded' '
git -C sparse-index reset --hard &&
ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
git -C sparse-index reset --hard &&
ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&

echo >>sparse-index/README.md &&
ensure_not_expanded add -A &&
echo >>sparse-index/extra.txt &&
ensure_not_expanded add extra.txt &&
echo >>sparse-index/untracked.txt &&
ensure_not_expanded add .
'

# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
Expand Down