Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: remove git2 from git subprocessing code paths in lib #5443

Closed
wants to merge 1 commit into from

Conversation

bsdinis
Copy link
Contributor

@bsdinis bsdinis commented Jan 23, 2025

Currently, the subprocessing code paths in the library were relying on git2 to figure out if a remote exists. This is because when git errors out for a remote not existing it outputs the same error as when the repository is not found.

As the cli tool makes a distinction between the two (e.g., after cloning, the remote must exist and as such we cannot send out an error that is shared by no remote found and no repository found), this was a hack put in place to sort out the difference.

It calls out to git remote to list out the remotes.

@yuja
Copy link
Contributor

yuja commented Jan 24, 2025

Can't we use gix API instead?
https://docs.rs/gix/latest/gix/struct.Repository.html#method.try_find_remote

It's unlikely we'll drop gix dependency, and the gix API is usually nicer and more reliable than shelling out git and parsing the output.

Perhaps, GitFetch::new() should obtain git2/gix repo instance from mut_repo (as we do in import_refs().)

EDIT: fwiw, if there are a few related small patches, I prefer including them in one PR. One PR per patch is conceptually nice, but is uneasy to manage without external tooling.

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 24, 2025

EDIT: fwiw, if there are a few related small patches, I prefer including them in one PR. One PR per patch is conceptually nice, but is uneasy to manage without external tooling.

Happy to close this one and compare the top of the stack to main

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 24, 2025

So on trying to use gix, I'm running into a weird thing.
When cloning an empty repo (as in

fn test_git_clone(subprocess: bool) {
) gix reports no remotes, while suprocessing does find the correct remote

fetching origin
gix: {} # output of gix_repo.remote_names()
git: {"origin"} # output of `git remote`

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 24, 2025

So on trying to use gix, I'm running into a weird thing.
When cloning an empty repo (as in

fn test_git_clone(subprocess: bool) {
) gix reports no remotes, while suprocessing does find the correct remote

fetching origin
gix: {} # output of gix_repo.remote_names()
git: {"origin"} # output of `git remote`

Will look more into this later, but it's definitely confusing2

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 24, 2025

So on trying to use gix, I'm running into a weird thing.
When cloning an empty repo (as in

fn test_git_clone(subprocess: bool) {
) gix reports no remotes, while suprocessing does find the correct remote

fetching origin
gix: {} # output of gix_repo.remote_names()
git: {"origin"} # output of `git remote`

Will look more into this later, but it's definitely confusing

@martinvonz
Copy link
Member

martinvonz commented Jan 24, 2025

Without knowing what your code looks, that sounds like it might be a bug in Gitoxide. GitHub user Byron (Gitoxide author) is usually very responsive, so we can probably just loop him in here if/when you feel confident that Gitoxide is behaving strangely.

@yuja
Copy link
Contributor

yuja commented Jan 25, 2025

gix::Repository caches config file, but we updates config file externally by using git2 API. That's probably why.
https://docs.rs/gix/latest/gix/struct.Repository.html#method.config_snapshot

We might have to reload our Arc<ReadonlyRepo> after adding the remote configuration.

git::add_remote(..)?;
let repo = repo.reload_at(repo.operation())?;
command.for_workable_repo(ui, workspace, repo)

@emilazy
Copy link
Contributor

emilazy commented Jan 25, 2025

IIRC remote/config management, fetches, and pulls were the only things remaining on git2 a year+ ago. Is that still the case? If so I could try to revive my old patch to migrate the former to Gitoxide which might help with this and pave the way to a git2-less build.

@yuja
Copy link
Contributor

yuja commented Jan 25, 2025

git::add_remote(..)?;
let repo = repo.reload_at(repo.operation())?;
command.for_workable_repo(ui, workspace, repo)

Oh, but the store is still reused iirc, so we'll need to reload the workspace (or RepoLoader)?

IIRC remote/config management, fetches, and pulls were the only things remaining on git2 a year+ ago. Is that still the case? If so I could try to revive my old patch to migrate the former to Gitoxide which might help with this and pave the way to a git2-less build.

+1

@bsdinis bsdinis force-pushed the bsdinis/push-zosznwvkmtnt branch from 077fcf3 to ede9576 Compare January 25, 2025 23:24
@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 26, 2025

I tried a very hacky way to get this to work (it didn't).
@yuja how do you feel about keeping with the subprocessing to git remote in the meantime.
It is honestly the simplest of them all to parse (it outputs a list of remote names, nothing else), and I'll add a note to remove when the gix problem is hopefully solved by @emilazy 's patch.

I only ask because this is blocking a bunch of the next refactorings, which require a strict separation between the git2 and subprocessing path (which is the direction I feel we'd like to move towards)

@bsdinis bsdinis force-pushed the bsdinis/push-zosznwvkmtnt branch from ede9576 to f7fbd21 Compare January 26, 2025 04:21
Currently, the subprocessing code paths in the library were relying on
git2 to figure out if a remote exists. This is because when git errors
out for a remote not existing it outputs the same error as when the
repository is not found.

As the cli tool makes a distinction between the two (e.g., after
cloning, the remote must exist and as such we cannot send out an error
that is shared by `no remote found` and `no repository found`), this was a
hack put in place to sort out the difference.

It calls out to `git remote` to list out the remotes.
@bsdinis bsdinis force-pushed the bsdinis/push-zosznwvkmtnt branch from f7fbd21 to 1857065 Compare January 26, 2025 04:23
@yuja
Copy link
Contributor

yuja commented Jan 26, 2025

how do you feel about keeping with the subprocessing to git remote in the meantime.

I feel this patch is going backwards because git2 API just works.

remove when the gix problem is hopefully solved by @emilazy 's patch.

I don't think it will solve the problem. Porting remote management API to gix is nice, but it wouldn't mean the remote configuration can be updated without reloading a repo instance.

this is blocking a bunch of the next refactorings, which require a strict separation between the git2 and subprocessing path (which is the direction I feel we'd like to move towards)

Oh, are you going to move git2/subprocess handling to caller? I would rather not, and leave it in library code. For example, GitFetch might look like:

struct GitFetch {
    inner: enum Git2(..)|Subprocess(..),
    ...
}

impl GitFetch {
    pub fn new(mut_repo, git_settings) -> Result<Self, ..> {
        get_git_backend(mut_repo.store()).map_err(..)
        ...
    }

@yuja
Copy link
Contributor

yuja commented Jan 26, 2025

Can you try this PR? #5476

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 26, 2025

Oh, are you going to move git2/subprocess handling to caller? I would rather not, and leave it in library code.

There are several good reasons to do this:

  1. For the UNC handling, if we want the GitSubprocessContext to be constructed with the workspace to have the root and the path to git handled correctly, we need to be aware of it at the caller.
  2. git2 is being used directly in the caller, would be good (IMO) to make it clear when it stops being needed.

Overall, I think a side by side implementation needs to holistically separate, so turning one of them off should be simple

@yuja
Copy link
Contributor

yuja commented Jan 27, 2025

  1. For the UNC handling, if we want the GitSubprocessContext to be constructed with the workspace to have the root and the path to git handled correctly, we need to be aware of it at the caller.

This also applies to git2 (and gix) implementation if we add one later. The problem is not UNC paths, but is that relative remote paths are resolved differently from the git CLI. If git2 provided an API to resolve configured remote by caller, we could do workspace_root.join(configured_remote_path) somewhere in the git2 code path.

  1. git2 is being used directly in the caller, would be good (IMO) to make it clear when it stops being needed.

Hmm, I have feeling that we should hide git2 use in library code so we can phase it out easily by replacing the implementation by gix or subprocessing. FWIW, I think the main reason we do instantiate git2 repo by caller is that, unlike git_repo() -> gix::Repository, open_git_repo() -> git2::Repository does non-trivial work, so the instance should be reused (by caller) if possible.

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 27, 2025

This also applies to git2 (and gix) implementation if we add one later. The problem is not UNC paths, but is that relative remote paths are resolved differently from the git CLI. If git2 provided an API to resolve configured remote by caller, we could do workspace_root.join(configured_remote_path) somewhere in the git2 code path.

The UNC problem I think is hard to solve with lib is unrelated to remotes. It's more that we have to pass in --git-dir to git and to avoid UNC problems we want that path to be relative to the workspace dir (which we would pass as Command::current_dir). This is the hard bit to figure out from the lib alone.

Hmm, I have feeling that we should hide git2 use in library code so we can phase it out easily by replacing the implementation by gix or subprocessing. FWIW, I think the main reason we do instantiate git2 repo by caller is that, unlike git_repo() -> gix::Repository, open_git_repo() -> git2::Repository does non-trivial work, so the instance should be reused (by caller) if possible.

From looking at the code, cli is using a git2::Repository for doing a bunch of work. My idea with separating it at the caller is to make it explicit that, from that point on, the git2 repository is not being used again. It would also future proof the Subprocessing path against using git2 in the future (which I think is desirable). Then we can work on migrating the cli code to gitoxide independently.

@yuja
Copy link
Contributor

yuja commented Jan 27, 2025

This also applies to git2 (and gix) implementation if we add one later. The problem is not UNC paths, but is that relative remote paths are resolved differently from the git CLI. If git2 provided an API to resolve configured remote by caller, we could do workspace_root.join(configured_remote_path) somewhere in the git2 code path.

The UNC problem I think is hard to solve with lib is unrelated to remotes. It's more that we have to pass in --git-dir to git and to avoid UNC problems we want that path to be relative to the workspace dir (which we would pass as Command::current_dir). This is the hard bit to figure out from the lib alone.

If we don't have relative configured remote paths, we can simply run git at the <git-dir>, right?
My point is that, the workspace_root path is needed by all implementations in order to make them behave the same. (git2 doesn't need it just because we have no way to fix the problem.)

Hmm, I have feeling that we should hide git2 use in library code so we can phase it out easily by replacing the implementation by gix or subprocessing. FWIW, I think the main reason we do instantiate git2 repo by caller is that, unlike git_repo() -> gix::Repository, open_git_repo() -> git2::Repository does non-trivial work, so the instance should be reused (by caller) if possible.

From looking at the code, cli is using a git2::Repository for doing a bunch of work. My idea with separating it at the caller is to make it explicit that, from that point on, the git2 repository is not being used again. It would also future proof the Subprocessing path against using git2 in the future (which I think is desirable). Then we can work on migrating the cli code to gitoxide independently.

If @emilazy's patch gets merged, I expect most of git2 uses are removed from CLI (except for tests.) Remaining parts would be just fetch and pull.

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 27, 2025

Okay, I see your point. If that's the case, I think I might abandon most of the chain of work I've been doing, will rebase the relevant changes and get them out

@bsdinis
Copy link
Contributor Author

bsdinis commented Jan 28, 2025

Closing

@bsdinis bsdinis closed this Jan 28, 2025
@bsdinis bsdinis deleted the bsdinis/push-zosznwvkmtnt branch February 1, 2025 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants