Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'--tree-ish' is broken #30

Open
BenWiederhake opened this issue Oct 13, 2015 · 6 comments
Open

'--tree-ish' is broken #30

BenWiederhake opened this issue Oct 13, 2015 · 6 comments

Comments

@BenWiederhake
Copy link
Contributor

BenWiederhake commented Oct 13, 2015

I don't understand what the following line (currently at https://github.com/fabacab/git-archive-all.sh/blob/master/git-archive-all.sh#L242 ) attempted to achieve:

TREEISH=$(git submodule | grep "^ .*${path%/} " | cut -d ' ' -f 2)

This is completely

First of all, git submodule only lists the "direct" submodules, not the "transitive" ones. This may be related to #2. Consider using something like git submodule foreach --recursive pwd.

The grep part assumes that the current state of the submodule is clean (the first char is for "clean",+for "changes made", etc.). That's not guaranteed. Indeed,git-archive-all.sh --tree-ish only really makes sense when the given tree-ish is different from HEAD.

The cut part tries to finish the regex matching that should have been done in grep; see grep -o.

It doesn't care anywhere about the original --tree-ish argument at all.

@fabacab
Copy link
Owner

fabacab commented Oct 15, 2015

Consider using something like git submodule foreach --recursive pwd.

Git's foreach command did not exist when this code was written. IIRC, it wasn't available for about two years after this script was published.

I think this line passed the current submodule commit head to the submodule's git archive command. It's been years; this could probably use some updating.

@KSR-Yasuda
Copy link

For short, there's no way to know exact submodule's commit at the master repo's target commit?

I tried to illustrate it.


In case of below:

git archive-all -t a01 archive.tar
[repo A]                        [repo B] (submodule)

                                <uncommitted change>
commit a02 (HEAD)   ----------> commit b02 (HEAD)
commit a01 (TARGET) ----------> commit b01

git submodule returns like:

+__HASH_FOR_THE_UNCOMMITTED_CHANGE__ b (heads/master)

git submodule --cached fix that it points to the <uncommitted change>, though,
it only returns the submodule's commit associated with the parent repo's current commit:

Now repo A's HEAD is a02, and b02 of repo B is bound.
So, now git submodule --cached returns

+b02xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx b (b02xxxx)

To archive -t a01, you need to know the bounded repo B's commit b01;
but there's no way unless you checkout the target commit a01, right?

Of course, you shouldn't change the working dir condition.
You can't know the exact submodule's status in the past.


git submodule --cached change is below.
But this does not help everything as said above.

--- a/git-archive-all.sh
+++ b/git-archive-all.sh
@@ -249,9 +249,9 @@ fi
 if [ $VERBOSE -eq 1 ]; then
     echo -n "archiving submodules..."
 fi
-git submodule >>"$TMPLIST"
+git submodule --cached >>"$TMPLIST"
 while read path; do
-    TREEISH=$(grep "^ .*${path%/} " "$TMPLIST" | cut -d ' ' -f 2) # git submodule does not list trailing slashes in $path
+    TREEISH=$(grep "^.* ${path%/} " "$TMPLIST" | sed -e 's/^.//' | cut -d ' ' -f 1) # git submodule does not list trailing slashes in $path
     cd "$path"
     rm -f "$TMPDIR"/"$(echo "$path" | sed -e 's/\//./g')"$FORMAT
     git archive --format=$FORMAT --prefix="${PREFIX}$path" $ARCHIVE_OPTS ${TREEISH:-HEAD} > "$TMPDIR"/"$(echo "$path" | sed -e 's/\//./g')"$FORMAT

Also, git submodule status returns uncommitted submodule.
This is also a problem.

Means, git submodule add-ed, but have not commit the change yet.

[repo C]                        [repo A]                        [repo B] (submodule)

commit c03 (HEAD) <------------ <uncommitted change>            <uncommitted change>
                                commit a02 (HEAD)    ---------> commit b02 (HEAD)
                                commit a01 (TARGET)  ---------> commit b01

In case, git submodule status returns the editing submodule, C, regardless of --cached option.

% git submodule
+__HASH_FOR_THE_UNCOMMITTED_CHANGE__ b (heads/master)
 c03xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx c (c03xxxx)
% git submodule --cached
+b02xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx b (b02xxxx)
 c03xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx c (c03xxxx)

Finally, it needs some command to know submodules' status at the specific commit.
Without such command, --tree-ish cannot be fixed.

Now, you'd better not rely on --tree-ish,
but checkout the target commit by yourself and archive HEAD, sad to say.

@KSR-Yasuda
Copy link

How is calling git ls-tree for each submodule path obtained by git submodule status? (see #42)

@KSR-Yasuda
Copy link

How is calling git ls-tree for each submodule path obtained by git submodule status? (see #42)

git ls-tree was not good for sub-submodules (recursively contained submodules).

Instead, now there exists git submodule --recursive --cached.
Perhaps this may work?

@KSR-Yasuda
Copy link

Instead, now there exists git submodule --recursive --cached. Perhaps this may work?

Unfortunately, it was not complete, either.

It just checks for the commit of the submodules bounded to the top repo HEAD.

@KSR-Yasuda
Copy link

KSR-Yasuda commented Feb 8, 2024

Fixed the PR to use

  • git ls-tree, if available
    (top repo's direct submodules, and also non-direct ones as far as it can)
  • otherwise, git submodule --recursive --cached
  • if none succeeds, submodule's HEAD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants