Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial clone ux #13

Open
wants to merge 3 commits into
base: partial-clone-ux
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion Documentation/git-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ SYNOPSIS
[--dissociate] [--separate-git-dir <git dir>]
[--depth <depth>] [--[no-]single-branch] [--no-tags]
[--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
[--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
[--[no-]remote-submodules] [--jobs <n>] [--sparse]
[--partial[=<size>]|--filter=<filter>] [--] <repository>
[<directory>]

DESCRIPTION
Expand Down Expand Up @@ -162,6 +163,18 @@ objects from the source repository into a pack in the cloned repository.
of the repository. The sparse-checkout file can be
modified to grow the working directory as needed.

--partial[=<size>]::
--filter=<filter-spec>::
Use the partial clone feature and request that the server sends
a subset of reachable objects according to a given object filter.
When using `--filter`, the supplied `<filter-spec>` is used for
the partial clone filter. When using `--partial` with no `<size>`,
the `blob:none` filter is applied to filter all blobs. When using
`--partial=<size>` the `blob:limit=<size>` filter is applied to
filter all blobs with size larger than `<size>`. For more details
on filter specifications, see the `--filter` option in
linkgit:git-rev-list[1].

--mirror::
Set up a mirror of the source repository. This implies `--bare`.
Compared to `--bare`, `--mirror` not only maps local branches of the
Expand Down Expand Up @@ -297,6 +310,78 @@ or `--mirror` is given)
for `host.xz:foo/.git`). Cloning into an existing directory
is only allowed if the directory is empty.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on commit message:

clone: document partial clone section

This is a good, concise title!

Partial clone belongs to a feature of clone, but there is no related
help information in git-clone document, so add a relevant section will
make user a better understand to partial clone.

Instead of "Partial clone belongs to a feature of clone" I would say "Partial clones are created using 'git clone'".

"information in the git-clone document" (missing "the")

This sentence is a run-on. I suggest splitting the sentence after "in the git-clone document." then "Add a relevant section..." (insert period, drop "so")

The last bit is a little confusing. Here is a retry at the paragraph:

Partial clones are created using 'git clone', but there is no related
help information in the git-clone documentation. Add a relevant
section to help users understand what partial clones are and how
they differ from normal clones.

The section briefly introduces the applicable scenarios and some
precautions of partial clone. If users want to know more about its
technical design and other detailsn, users can view the link of
git-partial-clone(7) according to the guidelines in the section.

typo: "detailsn" should be "details".

You'll want the "Signed-off-by:" to match your name and email. If you run git commit -s --amend it will amend the commit and add the proper sign-off for you at the end. Do not add sign-offs for others. This statements means "To the best of my knowledge I know I can release the copyright of this material to the GPL" or something like that (I am not a lawyer).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I test the documentation?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to run make inside the Documentation directory.

You might need to install asciidoc or some other dependencies you don't need for the normal build.

Partial Clone
-------------

By default, `git clone` will download every reachable object, including
every version of every file in the history of the repository. The
**partial clone** feature allows Git to transfer fewer objects and
request them from the remote only when they are needed, so some
reachable objects can be omitted from the initial `git clone` and
subsequent `git fetch` operations.

To use the partial clone feature, you can run `git clone` with the
`--filter=<filter-spec>` option. If you want to clone a repository
without download any blobs, the form `filter=blob:none` will omit all
the blobs. If the repository has some large blobs and you want to
prevent some large blobs being downloaded by an appropriate threshold,
the form `--filter=blob:limit=<n>[kmg]`omits blobs larger than n bytes
or units (see linkgit:git-rev-list[1]).

As mentioned before, a partially cloned repository may have to request
the missing objects when they are needed. So some 'local' commands may
fail without a network connection to the remote repository.

For example, The <repository> contains two branches which names 'master'
and 'topic. Then, we clone the repository by

$ git clone --filter=blob:none --no-checkout <repository>

With the `--filter=blob:none` option Git will omit all the blobs and
the `--no-checkout` option Git will not perform a checkout of HEAD
after the clone is complete. Then, we check out the remote tracking
'topic' branch by

$ git checkout -b topic origin/topic

The output looks like

------------
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 43 bytes | 43.00 KiB/s, done.
Branch 'topic' set up to track remote branch 'topic' from 'origin'.
Switched to a new branch 'topic'
------------

The output is a bit surprising but it shows how partial clone works.
When we check out the branch 'topic' Git will request the missing blobs
because they are needed. Then, We can switch back to branch 'master' by

$ git checkout master

This time the output looks like

------------
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
------------

It shows that when we switch back to the previous location, the checkout
is done without a download because the repository has all the blobs that
were downloaded previously.

`git log` may also make a surprise with partial clones. `git log
-- <pathspec>` will not cause downloads with the blob filters, because
it's only reading commits and trees. In addition to any options that
require git to look at the contents of blobs, like "-p" and "--stat"
, options that cause git to report pathnames, like "--summary" and
"--raw", will trigger lazy/on-demand fetching of blobs, as they are
needed to detect inexact renames.

linkgit:partial-clone[1]

:git-clone: 1
include::urls.txt[]

Expand Down
18 changes: 18 additions & 0 deletions list-objects-filter-options.c
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,24 @@ int opt_parse_list_objects_filter(const struct option *opt,
return 0;
}

int opt_set_blob_none_filter(const struct option *opt,
const char *arg, int unset)
{
struct strbuf filter_arg = STRBUF_INIT;
struct list_objects_filter_options *filter_options = opt->value;

if (unset || !arg || !strcmp(arg, "0")) {
parse_list_objects_filter(filter_options, "blob:none");
return 0;
}

strbuf_addf(&filter_arg, "blob:limit=%s", arg);
parse_list_objects_filter(filter_options, filter_arg.buf);
strbuf_release(&filter_arg);

return 0;
}

const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
{
if (!filter->filter_spec.nr)
Expand Down
8 changes: 7 additions & 1 deletion list-objects-filter-options.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ struct list_objects_filter_options {

/* Normalized command line arguments */
#define CL_ARG__FILTER "filter"
#define CL_ARG__PARTIAL "partial"

void list_objects_filter_die_if_populated(
struct list_objects_filter_options *filter_options);
Expand All @@ -80,11 +81,16 @@ void parse_list_objects_filter(

int opt_parse_list_objects_filter(const struct option *opt,
const char *arg, int unset);
int opt_set_blob_none_filter(const struct option *opt,
const char *arg, int unset);

#define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
N_("object filtering"), 0, \
opt_parse_list_objects_filter }
opt_parse_list_objects_filter }, \
{ OPTION_CALLBACK, 0, CL_ARG__PARTIAL, fo, N_("size"), \
N_("partial clone with blob filter"), \
PARSE_OPT_OPTARG | PARSE_OPT_NONEG , opt_set_blob_none_filter }

/*
* Translates abbreviated numbers in the filter's filter_spec into their
Expand Down
42 changes: 32 additions & 10 deletions t/t5616-partial-clone.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,39 @@ test_expect_success 'setup bare clone for server' '
# confirm we are missing all of the known blobs.
# confirm partial clone was registered in the local config.
test_expect_success 'do partial clone 1' '
git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 &&

git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
for option in "--filter=blob:none" "--partial"
do
rm -rf pc1 &&
git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc1 &&

git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&

test_cmp expect_1.oids observed.oids &&
test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
done
'

test_cmp expect_1.oids observed.oids &&
test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
test_expect_success 'do partial clone with size limit' '
for option in "--filter=blob:limit=1" "--partial=1"
do
rm -rf pc-limit &&
git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc-limit &&

git -C pc-limit rev-list --quiet --objects --missing=print HEAD >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&

test_cmp expect_1.oids observed.oids &&
test "$(git -C pc-limit config --local core.repositoryformatversion)" = "1" &&
test "$(git -C pc-limit config --local remote.origin.promisor)" = "true" &&
test "$(git -C pc-limit config --local remote.origin.partialclonefilter)" = "blob:limit=1"
done
'

test_expect_success 'verify that .promisor file contains refs fetched' '
Expand Down