Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README to include Scalar CLI details #377

Merged
merged 11 commits into from
Jun 17, 2021
107 changes: 85 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,107 @@
Microsoft Git
===============
`microsoft/git` and the Scalar CLI
==================================

[![CI/PR](https://github.com/microsoft/git/actions/workflows/main.yml/badge.svg)](https://github.com/microsoft/git/actions/workflows/main.yml)

This is Microsoft Git, a special Git distribution to support monorepo scenarios. If you are _not_ working in a monorepo, you are likely searching for [Git for Windows](http://git-for-windows.github.io/) instead of this codebase.
This is `microsoft/git`, a special Git distribution to support monorepo scenarios. If you are _not_
working in a monorepo, you are likely searching for
[Git for Windows](http://git-for-windows.github.io/) instead of this codebase.

If you encounter problems with Microsoft Git, please report them as [GitHub issues](https://github.com/microsoft/git/issues).
In addition to the Git command-line interface (CLI), `microsoft/git` includes the Scalar CLI to
further enable working with extremely large repositories. Scalar is a tool to apply the latest
recommendations and use the most advanced Git features. You can read
[the Scalar CLI documentation](contrib/scalar/scalar.txt) or read our
[Scalar user guide](contrib/scalar/docs/index.md) including
[the philosophy of Scalar](contrib/scalar/docs/philosophy.md).

Why is Microsoft Git needed?
If you encounter problems with `microsoft/git`, please report them as
[GitHub issues](https://github.com/microsoft/git/issues).

Why is this fork needed?
=========================================================

Git is awesome - it's a fast, scalable, distributed version control system with an unusually rich command set that provides both high-level operations and full access to internals. What more could you ask for?
Git is awesome - it's a fast, scalable, distributed version control system with an unusually rich
command set that provides both high-level operations and full access to internals. What more could
you ask for?

Well, because Git is a distributed version control system, each Git repository has a copy of all
files in the entire history. As large repositories, aka _monorepos_ grow, Git can struggle to
manage all that data. As Git commands like `status` and `fetch` get slower, developers stop waiting
and start switching context. And context switches harm developer productivity.

Well, because Git is a distributed version control system, each Git repository has a copy of all files in the entire history. As large repositories, aka _monorepos_ grow, Git can struggle to manage all that data. As Git commands like `status` and `fetch` get slower, developers stop waiting and start switching context. And context switches harm developer productivity.
`microsoft/git` is focused on addressing these performance woes and making the monorepo developer
experience first-class. The Scalar CLI packages all of these recommendations into a simple set of
commands.

Microsoft Git is focused on addressing these performance woes and making the monorepo developer experience first-class. It does so in part by working with the [GVFS protocol](https://docs.microsoft.com/en-us/azure/devops/learn/git/gvfs-architecture#gvfs-protocol) to prefetch packs of commits and trees and delay downloading of associated blobs. This is required for monorepos using [VFS for Git](https://github.com/microsoft/VFSForGit/blob/master/Readme.md). Additionally, some Git hosting providers support the GVFS protocol instead of the Git-native [partial clone feature](https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/).
One major feature that Scalar recommends is [partial clone](https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/),
which reduces the amount of data transferred in order to work with a Git repository. While several
services such as GitHub support partial clone, Azure Repos instead has an older version of this
functionality called
[the GVFS protocol](https://docs.microsoft.com/en-us/azure/devops/learn/git/gvfs-architecture#gvfs-protocol).
The integration with the GVFS protocol present in `microsoft/git` is not appropriate to include in
the core Git client because partial clone is the official version of that functionality.

Downloading and Installing
=========================================================

If you're working in a monorepo and want to take advantage of Microsoft Git's performance boosts, you can
download the latest version installer for your OS from the [Releases page](https://github.com/microsoft/git/releases). Alternatively,
you can opt to install via the command line, using the below instructions for supported OSes:
If you're working in a monorepo and want to take advantage of the performance boosts in
`microsoft/git`, then you can download the latest version installer for your OS from the
[Releases page](https://github.com/microsoft/git/releases). Alternatively, you can opt to install
via the command line, using the below instructions for supported OSes:

## Windows
__Note:__ Winget is still in public preview, meaning you currently [need to take special installation steps](https://docs.microsoft.com/en-us/windows/package-manager/winget/#install-winget) (i.e. manually installing the `.appxbundle`, installing the preview version of [App Installer](https://www.microsoft.com/p/app-installer/9nblggh4nns1?ocid=9nblggh4nns1_ORSEARCH_Bing&rtc=1&activetab=pivot:overviewtab), or participating in the [Windows Insider flight ring](https://insider.windows.com/https://insider.windows.com/)).

__Note:__ Winget is still in public preview, meaning you currently
[need to take special installation steps](https://docs.microsoft.com/en-us/windows/package-manager/winget/#install-winget):
Either manually install the `.appxbundle` available at the
[preview version of App Installer](https://www.microsoft.com/p/app-installer/9nblggh4nns1?ocid=9nblggh4nns1_ORSEARCH_Bing&rtc=1&activetab=pivot:overviewtab),
or participate in the
[Windows Insider flight ring](https://insider.windows.com/https://insider.windows.com/)
since `winget` is available by default on preview versions of Windows.

To install with Winget, run

```shell
winget install microsoft/git
```

To upgrade Microsoft Git, use the following Git command, which will download and install the latest release.
Double-check that you have the right version by running these commands,
which should have the same output:

```shell
git version
scalar version
```

To upgrade `microsoft/git`, use the following Git command, which will download and install the latest
release.

```shell
git update-microsoft-git
```

You may also be alerted with a notification to upgrade, which presents a single-click process for running `git update-microsoft-git`.
You may also be alerted with a notification to upgrade, which presents a single-click process for
running `git update-microsoft-git`.

## macOS

To install Microsoft Git on macOS, first [be sure that Homebrew is installed](https://brew.sh/) then install the `microsoft-git` cask with these steps:
To install `microsoft/git` on macOS, first [be sure that Homebrew is installed](https://brew.sh/) then
install the `microsoft-git` cask with these steps:

```shell
brew tap microsoft/git
brew install --cask microsoft-git
```

To upgrade microsoft/git, you can run the necessary brew commands:
Double-check that you have the right version by running these commands,
which should have the same output:

```shell
git version
scalar version
```

To upgrade microsoft/git, you can run the necessary `brew` commands:

```shell
brew update
Expand All @@ -60,23 +112,34 @@ Or you can run the `git update-microsoft-git` command, which will run those brew

## Linux

For Ubuntu/Debian distributions, `apt-get` support is coming soon. For now, though, please use the most recent [`.deb` package](https://github.com/microsoft/git/releases).
For Ubuntu/Debian distributions, `apt-get` support is coming soon. For now, please use the most
recent [`.deb` package](https://github.com/microsoft/git/releases). For example, you can download a
specific version as follows:

```shell
wget -o microsoft-git.deb https://github.com/microsoft/git/releases/download/v2.31.1.vfs.0.1/git-vfs_2.31.1.vfs.0.1.deb
wget -O microsoft-git.deb https://github.com/microsoft/git/releases/download/v2.32.0.vfs.0.2/git-vfs_2.32.0.vfs.0.2.deb
sudo dpkg -i microsoft-git.deb
```

For other distributions, you will need to compile and install microsoft/git from source:
Double-check that you have the right version by running these commands,
which should have the same output:

```shell
git version
scalar version
```

For other distributions, you will need to compile and install `microsoft/git` from source:

```shell
git clone https://github.com/microsoft/git microsoft-git
cd microsoft-git
make -j12 prefix=/usr/local
sudo make -j12 prefix=/usr/local install
make -j12 prefix=/usr/local INCLUDE_SCALAR=YesPlease
sudo make -j12 prefix=/usr/local INCLUDE_SCALAR=YesPlease install
```

For more assistance building Git from source, see [the INSTALL file in the core Git project](https://github.com/git/git/blob/master/INSTALL).
For more assistance building Git from source, see
[the INSTALL file in the core Git project](https://github.com/git/git/blob/master/INSTALL).

Contributing
=========================================================
Expand Down
51 changes: 51 additions & 0 deletions contrib/scalar/docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Frequently Asked Questions
==========================

Using Scalar
------------

### I don't want a sparse clone, I want every file after I clone!

Run `scalar clone --full-clone <url>` to initialize your repo to include
every file. You can switch to a sparse-checkout later by running
`git sparse-checkout init --cone`.

### I already cloned without `--full-clone`. How do I get everything?

Run `git sparse-checkout disable`.

Scalar Design Decisions
-----------------------

There may be many design decisions within Scalar that are confusing at first
glance. Some of them may cause friction when you use Scalar with your existing
repos and existing habits.

> Scalar has the most benefit when users design repositories
> with efficient patterns.

For example: Scalar uses the sparse-checkout feature to limit the size of the
working directory within a large monorepo. It is designed to work efficiently
with monorepos that are highly componentized, allowing most developers to
need many fewer files in their daily work.

### Why does `scalar clone` create a `<repo>/src` folder?

Scalar uses a file system watcher to keep track of changes under this `src` folder.
Any activity in this folder is assumed to be important to Git operations. By
creating the `src` folder, we are making it easy for your build system to
create output folders outside the `src` directory. We commonly see systems
create folders for build outputs and package downloads. Scalar itself creates
these folders during its builds.

Your build system may create build artifacts such as `.obj` or `.lib` files
next to your source code. These are commonly "hidden" from Git using
`.gitignore` files. Having such artifacts in your source tree creates
additional work for Git because it needs to look at these files and match them
against the `.gitignore` patterns.

By following the `src` pattern Scalar tries to establish and placing your build
intermediates and outputs parallel with the `src` folder and not inside it,
you can help optimize Git command performance for developers in the repository
by limiting the number of files Git needs to consider for many common
operations.
113 changes: 113 additions & 0 deletions contrib/scalar/docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
Getting Started
===============

Registering existing Git repos
------------------------------

To add a repository to the list of registered repos, run `scalar register [<path>]`.
If `<path>` is not provided, then the "current repository" is discovered from
the working directory by scanning the parent paths for a path containing a `.git`
folder, possibly inside a `src` folder.

To see which repositories are currently tracked by the service, run
`scalar list`.

Run `scalar unregister [<path>]` to remove the repo from this list.

Creating a new Scalar clone using the GVFS Protocol
---------------------------------------------------

The `clone` verb creates a local enlistment of a remote repository using the
[GVFS protocol](https://github.com/microsoft/VFSForGit/blob/HEAD/Protocol.md),
such as Azure Repos.

```
scalar clone [options] <url> [<dir>]
```

Create a local copy of the repository at `<url>`. If specified, create the `<dir>`
directory and place the repository there. Otherwise, the last section of the `<url>`
will be used for `<dir>`.

At the end, the repo is located at `<dir>/src`. By default, the sparse-checkout
feature is enabled and the only files present are those in the root of your
Git repository. Use `git sparse-checkout set` to expand the set of directories
you want to see, or `git sparse-checkout disable` to expand to all files. You
can explore the subdirectories outside your sparse-checkout specification using
`git ls-tree HEAD`.

### Sparse Repo Mode

By default, Scalar reduces your working directory to only the files at the
root of the repository. You need to add the folders you care about to build up
to your working set.

* `scalar clone <url>`
* Please choose the **Clone with HTTPS** option in the `Clone Repository` dialog in Azure Repos, not **Clone with SSH**.
* `cd <root>\src`
* At this point, your `src` directory only contains files that appear in your root
tree. No folders are populated.
* Set the directory list for your sparse-checkout using:
1. `git sparse-checkout set <dir1> <dir2> ...`
2. `git sparse-checkout set --stdin < dir-list.txt`
* Run git commands as you normally would.
* To fully populate your working directory, run `git sparse-checkout disable`.

If instead you want to start with all files on-disk, you can clone with the
`--full-clone` option. To enable sparse-checkout after the fact, run
`git sparse-checkout init --cone`. This will initialize your sparse-checkout
patterns to only match the files at root.

If you are unfamiliar with what directories are available in the repository,
then you can run `git ls-tree -d --name-only HEAD` to discover the directories
at root, or `git ls-tree -d --name-only HEAD <path>` to discover the directories
in `<path>`.

### Options

These options allow a user to customize their initial enlistment.

* `--full-clone`: If specified, do not initialize the sparse-checkout feature.
All files will be present in your `src` directory. This behaves very similar
to a Git partial clone in that blobs are downloaded on demand. However, it
will use the GVFS protocol to download all Git objects.

* `--cache-server-url=<url>`: If specified, set the intended cache server to
the specified `<url>`. All object queries will use the GVFS protocol to this
`<url>` instead of the origin remote. If the remote supplies a list of
cache servers via the `<url>/gvfs/config` endpoint, then the `clone` command
will select a nearby cache server from that list.

* `--branch=<ref>`: Specify the branch to checkout after clone.

* `--local-cache-path=<path>`: Use this option to override the path for the
local Scalar cache. If not specified, then Scalar will select a default
path to share objects with your other enlistments. On Windows, this path
is a subdirectory of `<Volume>:\.scalarCache\`. On Mac, this path is a
subdirectory of `~/.scalarCache/`. The default cache path is recommended so
multiple enlistments of the same remote repository share objects on the
same device.

### Advanced Options

The options below are not intended for use by a typical user. These are
usually used by build machines to create a temporary enlistment that
operates on a single commit.

* `--single-branch`: Use this option to only download metadata for the branch
that will be checked out. This is helpful for build machines that target
a remote with many branches. Any `git fetch` commands after the clone will
still ask for all branches.

* `--no-prefetch`: Use this option to not prefetch commits after clone. This
is not recommended for anyone planning to use their clone for history
traversal. Use of this option will make commands like `git log` or
`git pull` extremely slow and is therefore not recommended.

Removing a Scalar Clone
-----------------------

Since the `scalar clone` command sets up a file-system watcher (when available),
that watcher could prevent deleting the enlistment. Run `scalar delete <path>`
from outside of your enlistment to unregister the enlistment from the filesystem
watcher and delete the enlistment at `<path>`.
54 changes: 54 additions & 0 deletions contrib/scalar/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Scalar: Enabling Git at Scale
=============================

Scalar is a tool that helps Git scale to some of the largest Git repositories.
It achieves this by enabling some advanced Git features, such as:

* *Partial clone:* reduces time to get a working repository by not
downloading all Git objects right away.

* *Background prefetch:* downloads Git object data from all remotes every
hour, reducing the amount of time for foreground `git fetch` calls.

* *Sparse-checkout:* limits the size of your working directory.

* *File system monitor:* tracks the recently modified files and eliminates
the need for Git to scan the entire worktree.

* *Commit-graph:* accelerates commit walks and reachability calculations,
speeding up commands like `git log`.

* *Multi-pack-index:* enables fast object lookups across many pack-files.

* *Incremental repack:* Repacks the packed Git data into fewer pack-file
without disrupting concurrent commands by using the multi-pack-index.

By running `scalar register` in any Git repo, Scalar will automatically enable
these features for that repo (except partial clone) and start running suggested
maintenance in the background using
[the `git maintenance` feature](https://git-scm.com/docs/git-maintenance).

Repos cloned with the `scalar clone` command use partial clone or the
[GVFS protocol](https://github.com/microsoft/VFSForGit/blob/HEAD/Protocol.md)
to significantly reduce the amount of data required to get started
using a repository. By delaying all blob downloads until they are required,
Scalar allows you to work with very large repositories quickly. The GVFS
protocol allows a network of _cache servers_ to serve objects with lower
latency and higher throughput. The cache servers also reduce load on the
central server.

Documentation
-------------

* [Getting Started](getting-started.md): Get started with Scalar.
Includes `scalar register`, `scalar unregister`, `scalar clone`, and
`scalar delete`.

* [Troubleshooting](troubleshooting.md):
Collect diagnostic information or update custom settings. Includes
`scalar diagnose` and `scalar cache-server`.

* [The Philosophy of Scalar](philosophy.md): Why does Scalar work the way
it does, and how do we make decisions about its future?

* [Frequently Asked Questions](faq.md)
Loading