git: spawn a separate git process for network operations #5228

bsdinis · 2025-01-02T01:38:22Z

Reasoning:

jj fails to push/fetch over ssh depending on the system.
Issue #4979 lists over 20 related issues on this and proposes spawning
a git subprocess for tasks related to the network (in fact, just push/fetch
are enough).

This PR implements this.
Users can either enable shelling out to git in a config file:

[git]
subprocess = true

Implementation Details:

This PR implements shelling out to git via std::process::Command.
There are 2 sharp edges with the patch:

it relies on having to parse out git errors to match the error codes
(and parsing git2's errors in one particular instance to match the
error behaviour). This seems mostly unavoidable
to ensure matching behaviour with git2, the tests are maintained across the
two implementations. This is done using test_case, as with the rest
of the codebase

Testing:

Run the rust tests:

$ cargo test

Build:

$ cargo build

Clone a private repo:

$ path/to/jj git clone --shell <REPO_SSH_URL>

Create new commit and push

$ echo "TEST" > this_is_a_test_file.txt
$ path/to/jj describe -m 'test commit'
$ path/to/jj git push --shell -b <branch>

Issues Closed

With a grain of salt, but most of these problems should be fixed (or at least checked if they are fixed). They are the ones listed in #4979 .

SSH:

Clone/fetch/push/pull:

Notable Holdouts:

Interactive HTTP authentication (jj git clone doesn't support interactive authentication #401, Support authenticated HTTP #469)
libssh2-sys dependency on windows problem (can only be removed if/when we get rid of libgit2): libssh2-sys dependency won't compile on windows #3984

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

google-cla · 2025-01-02T01:38:26Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

yuja · 2025-01-02T09:07:12Z

it relies on having to parse out git errors to match the error codes (and parsing git2's errors in one particular instance to match the error behaviour). This seems unavoidable

That seems okay. It's also good to reorganize the current error enum as needed.

it is using a new feature flag shell to switch on to shelling out. This doesn't seem the best approach, and it would be great to get some feedback on what would be best. A flag on jj git + adding it to the jj config seems to be a good enough idea

I think it's best to add a config knob to turn the shelling-out backend on. If the implementation gets stable enough, we can change the default, and eventually remove the git2 backend. I think the flag can be checked by CLI layer, but that's not a requirement.
This refactoring PR #4960 might help as it splits fetch() into "fetch from remote" part and import_refs().

Regarding the code layout, maybe better to add new module for shelling-out implementation? The current git.rs isn't small, and there wouldn't be many codes to be shared.

Thanks!

cli/tests/test_git_clone.rs

bsdinis · 2025-01-02T18:45:46Z

Hi! Thanks for the feedback!

In the meantime, I was trying to make CI pass and did away with the feature flag (which was always my intention anyways but wanted to receive feedback).

~~Regarding the knob being present in config, I would argue for it:~~

~~I'd basically always have this on, so passing --shell on every command seems impractical~~
I'm not sure removing git2 would be necessarily the best, it might get good support for the ssh problem and overall seems it'd follow the codebase either way. So keeping the knob in config would allow a way to specify intended behaviour

EDIT: I misread a comment where I thought it was saying no config knob. sorry about that

PhilipMetzger · 2025-01-02T18:57:29Z

Can you also squash it down to one commit and change the commit message/PR title to cli: shell out to git to conform to our commit message style outlined here https://github.com/jj-vcs/jj/blob/main/docs/contributing.md#code-reviews.

thoughtpolice · 2025-01-02T19:31:49Z

I'm not sure removing git2 would be necessarily the best, it might get good support for the ssh problem and overall seems it'd follow the codebase either way. So keeping the knob in config would allow a way to specify intended behaviour

Getting rid of libgit2 is likely inevitable at this rate. It is only used for push/pull and is a large 3rd party dependency that we have probably outgrown by now, I'm almost certain the number one most-repeated bug report is "push does not work", with many users compiling themselves to fix. In reality I think it will probably remain a long tail of issues, things that won't work right, because the existing Git code, tools, and installers have a lot of quirks figured out for many various OS/workflow combinations, e.g. something like Git for Windows Credential Manager, or auto-ssh key unlock via secure enclaves, etc.

In the future if it supported this stuff really well, bringing back the code probably wouldn't be too bad. Or maybe Gitoxide will get good support, which would be even better. But in the meantime, assuming this improves the user experience now, and assuming we turn it on by default at some point, it's probably no longer worth keeping git2 around.

emilazy · 2025-01-02T19:50:48Z

Thanks for taking up this task! I’ll try to do a more thorough review later but for now I wanted to mention that this should use the fancy Git --force-with-lease=<refname>:<expect> option to preserve the current behaviour of only pushing to a ref if it matches Jujutsu’s expectation for the bookmark, as discussed on the Discord a while ago. I also agree with Yuya that it makes sense to base this on the refactors in #4960.

(Also +1 to getting rid of git2 – I think this should be a configuration option, at first defaulting to git2, then the next release to shelling out, then hopefully we can just get rid of the old path entirely the release after if all goes well. In the long run, Gitoxide growing proper push support or Git librarification may offer nicer options that don’t involve shelling out, but I doubt that libgit2 will ever be the right thing for us, based on our extensive experience using it so far, and as long as we don’t run into any unexpected blockers with shelling out I don’t think it makes sense to keep a C dependency on an entirely separate Git implementation just for “jj git {fetch,push}, but worse”.)

arxanas

Exciting work!

cli/src/commands/git/clone.rs

cli/tests/test_git_clone.rs

lib/src/git.rs

bsdinis · 2025-01-02T20:53:47Z

Thanks for taking up this task! I’ll try to do a more thorough review later but for now I wanted to mention that this should use the fancy Git --force-with-lease=<refname>:<expect> option to preserve the current behaviour of only pushing to a ref if it matches Jujutsu’s expectation for the bookmark, as discussed on the Discord a while ago.

I looked into this, but am unsure this completely mimics the behaviour as it stands right now. The patch uses git ls-remote to find the remote ref and then uses the existing logic to figure out if jj accepts it or not

bsdinis · 2025-01-02T20:56:09Z

This refactoring PR #4960 might help as it splits fetch() into "fetch from remote" part and import_refs().

I agree. In fact, there is a current clippy warning about a function with too many arguments that is hard to go around without it.

PS: I intend to work on the remaining issues on top of main and then rebasing onto the PR when that's the only thing left

emilazy · 2025-01-02T20:56:12Z

The version with the argument lets you specify the exact expected commit for each remote ref, so you should be able to implement the current logic on top of it. I think git ls-remote won’t work correctly as it introduces a race condition.

cli/src/commands/git/clone.rs

cli/src/git_util.rs

lib/src/git_shell.rs

joyously · 2025-01-03T01:30:59Z

Does this share functionality with #4759 ? At a surface level, it seems similar.

bsdinis · 2025-01-03T02:43:14Z

Does this share functionality with #4759 ? At a surface level, it seems similar.

I don't think it does. This PR is about using git as a subprocess for remote network operations, instead of the git2 library. The mentioned PR seems to be about more generic exec capabilities (if I understand correctly, related with the scripting language).

bsdinis · 2025-01-03T03:07:29Z

Re: --force-with-lease.
It worked better than I expected, but there is a (slight) mismatch in behaviour.

The particular test case that fails regards if we are pushing a branch deletion, and the branch was already deleted on the remote (and as such, is not at the expected commit) jj accepts it (with PushAllowReason::UnexpectedNoop) whilst git does not. There are a couple things we could do here:

Accept the differing behaviour;
Make jj error out here.

Would be great to get perspective on what to do here!

cc @emilazy

yuja

(Just dropping random comments. I haven't reviewed the code thoroughly.)

The issue with clippy is that jj_lib::git::fetch takes too many arguments.

I think there are three ways to fix this:

Silence the warning

Add a GitFetchArgs struct that has the logical things we want to find and fetch (the remote and the branch names). This would get it back under the limit

Do this on top of git: update jj git clone|fetch to use new GitFetch api directly. #4960

I think 1 is good assuming it's temporary. Thanks.

cli/src/commands/git/clone.rs

cli/src/git_util.rs

lib/src/git_subprocess.rs

martinvonz

I'm not done reviewing, but this is a pretty large PR so I'll have to do it in increments

.github/workflows/build.yml

lib/src/config/misc.toml

lib/src/git.rs

cli/tests/common/mod.rs

lib/src/git.rs

lib/src/git_subprocess.rs

lib/src/git.rs

lib/src/git_subprocess.rs

.github/workflows/build.yml

CHANGELOG.md

martinvonz · 2025-01-10T23:52:51Z

lib/src/git.rs

+    for full_refspec in refspecs {
+        let remote_ref = full_refspec.split(":").last();
+        let expected_remote_location = remote_ref.and_then(|remote_ref| {
+            qualified_remote_refs_expected_locations
+                .get(remote_ref)
+                .and_then(|x| x.map(|y| y.to_string()))
+        });
+        git_ctx.spawn_push(
+            remote_name,
+            remote_ref,
+            full_refspec,
+            expected_remote_location.as_deref(),
+            &mut failed_ref_matches,
+        )?;
+        if let Some(remote_ref) = remote_ref {
+            remaining_remote_refs.remove(remote_ref);
+        }
+    }


Same as for git fetch above: This seems to run one git push per refspec. We should be able to call it just once for all updates, right?

With git push it makes it very much easier to call it once per ref to collect the failing ones into the vec

I think it’ll result in one SSH auth per ref in many common configurations, which doesn’t seem shippable. I think we need to find a way to do the fetches and pushes in one go.

i agree with the sentiment, but my experience with it was that it was fundamentally hard to match the existing behaviour when doing it in one go.

However, now that the codebase is in a better state maybe it's time to look back at it (i tried with fetch and it wasn't great).

One possible strategy is to reach for --atomic, parse the error to figure out which branches are failing and then retry without those. But I'll look into it

I personally think slightly different semantics are better than multiple SSH connections, as long as the semantics don’t seem too unreasonable.

martinvonz · 2025-01-10T23:55:14Z

lib/src/git.rs

+        for refspec in refspecs {
+            git_ctx.spawn_fetch(remote_name, self.depth, &refspec, &mut prunes)?;
+        }


This seems to run git fetch once per branch. We should be able to fetch all requested branches in a single call, right?

So, here's what i've tried;

getting all the refs in git fetch

adding --atomic

They all fail in test_git_fetch_bookmarks_some_missing, particularly when we're fetching two branches (rem1 and rem2), each only existing on one of two remotes present (named after the branches). I can't say I understand the particulars of how this works exactly, would be happy to push a separate commit with this for people to try out.

Log of cargo test test_git_fetch:

[...] failures: ---- test_git_fetch::test_git_fetch_bookmarks_some_missing::spawn_a_git_subprocess_for_remote_calls stdout ---- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot: git_fetch_bookmarks_some_missing Source: cli/tests/test_git_fetch.rs:1064 ───────────────────────────────────────────────────────────────────────────────────────────────────────── Expression: stderr ───────────────────────────────────────────────────────────────────────────────────────────────────────── -old snapshot +new results ────────────┬──────────────────────────────────────────────────────────────────────────────────────────── 0 │-bookmark: rem1@rem1 [new] tracked 1 │-bookmark: rem2@rem2 [new] tracked 0 │+Warning: No branch matching `rem1` found on any specified/configured remote 1 │+Warning: No branch matching `rem2` found on any specified/configured remote 2 │+Nothing changed. ────────────┴────────────────────────────────────────────────────────────────────────────────────────────

lib/src/git_subprocess.rs

yuja · 2025-01-11T02:13:31Z

lib/src/git_subprocess.rs

+            if e.kind() == std::io::ErrorKind::NotFound {
+                if self.git_path.is_absolute() {
+                    GitSubprocessError::GitCommandNotFound(self.git_path.to_path_buf())
+                } else {
+                    GitSubprocessError::GitCommandNotFoundInPath(self.git_path.to_path_buf())
+                }
+            } else {
+                GitSubprocessError::Spawn(e)
+            }


nit: another simpler (but less user-friendly-message) option is:

#[error("Failed to execute git command at path {path}")] GitSubprocessError::Spawn { path: PathBuf, // this will be printed in the next line #[source] error: io::Error, }

I think it's important to be as user friendly with this error as possible: this will undoubtedly cause a lot of pain on systems where the git installation is peculiar (e.g.: windows). The error message should be as crisp as possible and the current error structure enables that. The tradeoff of a slightly more complex error variant seems worth it

We often add user-friendly error message as a hint in command_error.rs. It might be nicer than splitting the same error into three variants.

I've been trying this out and by the time we get to handling this error in CLI the error is now a variant of SubprocessError which seems weird to parse out.

The main thing I think is important here is to be able to report back to the user that some PATH resolution failed vs. some particular path did not work

Hmm, I don't think it's bad to generate error hint based on property of inner error, but I don't have a strong feeling that we should deduplicate error variants, either.

FWIW, it's better to add executable path to the Spawn error because execution may fail due to other path-related reasons (permission, executable format, etc.)

lib/src/git_subprocess.rs

Reasoning: `jj` fails to push/fetch over ssh depending on the system. Issue jj-vcs#4979 lists over 20 related issues on this and proposes spawning a `git` subprocess for tasks related to the network (in fact, just push/fetch are enough). This PR implements this. Users can either enable shelling out to git in a config file: ```toml [git] subprocess = true ``` Implementation Details: This PR implements shelling out to `git` via `std::process::Command`. There are 2 sharp edges with the patch: - it relies on having to parse out git errors to match the error codes (and parsing git2's errors in one particular instance to match the error behaviour). This seems mostly unavoidable - to ensure matching behaviour with git2, the tests are maintained across the two implementations. This is done using test_case, as with the rest of the codebase Testing: Run the rust tests: ``` $ cargo test ``` Build: ``` $ cargo build ``` Clone a private repo: ``` $ path/to/jj git clone --shell <REPO_SSH_URL> ``` Create new commit and push ``` $ echo "TEST" > this_is_a_test_file.txt $ path/to/jj describe -m 'test commit' $ path/to/jj git push --shell -b <branch> ```  Issues Closed With a grain of salt, but most of these problems should be fixed (or at least checked if they are fixed). They are the ones listed in jj-vcs#4979 . SSH: - jj-vcs#63 - jj-vcs#440 - jj-vcs#1455 - jj-vcs#1507 - jj-vcs#2931 - jj-vcs#2958 - jj-vcs#3322 - jj-vcs#4101 - jj-vcs#4333 - jj-vcs#4386 - jj-vcs#4488 - jj-vcs#4591 - jj-vcs#4802 - jj-vcs#4870 - jj-vcs#4937 - jj-vcs#4978 - jj-vcs#5120 - jj-vcs#5166 Clone/fetch/push/pull: - jj-vcs#360 - jj-vcs#1278 - jj-vcs#1957 - jj-vcs#2295 - jj-vcs#3851 - jj-vcs#4177 - jj-vcs#4682 - jj-vcs#4719 - jj-vcs#4889 - jj-vcs#5147 - jj-vcs#5238 Notable Holdouts: - Interactive HTTP authentication (jj-vcs#401, jj-vcs#469) - libssh2-sys dependency on windows problem (can only be removed if/when we get rid of libgit2): jj-vcs#3984

bnjmnt4n reviewed Jan 2, 2025

View reviewed changes

cli/tests/test_git_clone.rs Outdated Show resolved Hide resolved

arxanas reviewed Jan 2, 2025

View reviewed changes

bsdinis force-pushed the shell-out branch from 807bc10 to 9eae40b Compare January 2, 2025 20:01

bsdinis force-pushed the shell-out branch from 9eae40b to 7eaf92c Compare January 2, 2025 21:01

bsdinis changed the title ~~feat(shell-out): allow shelling out to git for network related calls~~ cli: shell out to git Jan 2, 2025

bsdinis changed the title ~~cli: shell out to git~~ cli: shell out to git for network operations Jan 2, 2025

bsdinis force-pushed the shell-out branch 4 times, most recently from daa498e to 5696825 Compare January 2, 2025 22:48

martinvonz reviewed Jan 3, 2025

View reviewed changes

cli/src/commands/git/clone.rs Outdated Show resolved Hide resolved

cli/src/git_util.rs Outdated Show resolved Hide resolved

lib/src/git_shell.rs Outdated Show resolved Hide resolved

bsdinis force-pushed the shell-out branch from 5696825 to fe605df Compare January 3, 2025 02:41

bsdinis force-pushed the shell-out branch 6 times, most recently from 9a733d2 to bb5e58c Compare January 3, 2025 04:51

bsdinis force-pushed the shell-out branch from 43b7405 to 59e3c9c Compare January 9, 2025 00:05

yuja reviewed Jan 9, 2025

View reviewed changes

bsdinis force-pushed the shell-out branch 4 times, most recently from 9fbd714 to d455d71 Compare January 9, 2025 13:54

This was referenced Jan 9, 2025

Unable to connect to Sourcehut over SSH on Windows with JJ #4802

Open

FR: Honor remote refspec definitions when fetching from git colocated workspaces #5323

Open

martinvonz reviewed Jan 10, 2025

View reviewed changes

bsdinis force-pushed the shell-out branch 5 times, most recently from 9a0ab4d to 271c05d Compare January 10, 2025 15:57

martinvonz reviewed Jan 10, 2025

View reviewed changes

lib/src/git_subprocess.rs Outdated Show resolved Hide resolved

bsdinis force-pushed the shell-out branch from 271c05d to fe7eb75 Compare January 10, 2025 18:25

martinvonz reviewed Jan 10, 2025

View reviewed changes

bsdinis force-pushed the shell-out branch 6 times, most recently from 7b73aa3 to b85e3cc Compare January 11, 2025 01:44

yuja reviewed Jan 11, 2025

View reviewed changes

lib/src/git_subprocess.rs Outdated Show resolved Hide resolved

bsdinis force-pushed the shell-out branch 3 times, most recently from e0075e2 to fd77709 Compare January 11, 2025 02:33

bsdinis force-pushed the shell-out branch from fd77709 to c7d913c Compare January 11, 2025 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

git: spawn a separate git process for network operations #5228

git: spawn a separate git process for network operations #5228

bsdinis commented Jan 2, 2025 •

edited

Loading

google-cla bot commented Jan 2, 2025

yuja commented Jan 2, 2025

bsdinis commented Jan 2, 2025 •

edited

Loading

PhilipMetzger commented Jan 2, 2025

thoughtpolice commented Jan 2, 2025 •

edited

Loading

emilazy commented Jan 2, 2025 •

edited

Loading

arxanas left a comment

bsdinis commented Jan 2, 2025

bsdinis commented Jan 2, 2025 •

edited

Loading

emilazy commented Jan 2, 2025

joyously commented Jan 3, 2025

bsdinis commented Jan 3, 2025

bsdinis commented Jan 3, 2025 •

edited

Loading

yuja left a comment

martinvonz left a comment

martinvonz Jan 10, 2025

bsdinis Jan 11, 2025

emilazy Jan 11, 2025

bsdinis Jan 11, 2025

emilazy Jan 11, 2025

martinvonz Jan 10, 2025

bsdinis Jan 11, 2025

yuja Jan 11, 2025

bsdinis Jan 11, 2025 •

edited

Loading

yuja Jan 11, 2025

bsdinis Jan 12, 2025

yuja Jan 12, 2025

git: spawn a separate git process for network operations #5228

Are you sure you want to change the base?

git: spawn a separate git process for network operations #5228

Conversation

bsdinis commented Jan 2, 2025 • edited Loading

Reasoning:

Implementation Details:

Testing:

Build:

Clone a private repo:

Create new commit and push

Issues Closed

Checklist

google-cla bot commented Jan 2, 2025

yuja commented Jan 2, 2025

bsdinis commented Jan 2, 2025 • edited Loading

PhilipMetzger commented Jan 2, 2025

thoughtpolice commented Jan 2, 2025 • edited Loading

emilazy commented Jan 2, 2025 • edited Loading

arxanas left a comment

Choose a reason for hiding this comment

bsdinis commented Jan 2, 2025

bsdinis commented Jan 2, 2025 • edited Loading

emilazy commented Jan 2, 2025

joyously commented Jan 3, 2025

bsdinis commented Jan 3, 2025

bsdinis commented Jan 3, 2025 • edited Loading

yuja left a comment

Choose a reason for hiding this comment

martinvonz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsdinis Jan 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsdinis commented Jan 2, 2025 •

edited

Loading

bsdinis commented Jan 2, 2025 •

edited

Loading

thoughtpolice commented Jan 2, 2025 •

edited

Loading

emilazy commented Jan 2, 2025 •

edited

Loading

bsdinis commented Jan 2, 2025 •

edited

Loading

bsdinis commented Jan 3, 2025 •

edited

Loading

bsdinis Jan 11, 2025 •

edited

Loading