Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalar: make GVFS Protocol a forced choice #648

Merged
merged 1 commit into from
May 14, 2024

Conversation

derrickstolee
Copy link

In the Office monorepo, we've recently had an uptick in issues with scalar clone. These issues didn't make sense at first and seemed like the users weren't using microsoft/git but instead the upstream version's scalar clone. Instead of using GVFS cache servers, they were attempting to use the Git protocol's partial clone (which times out).

It turns out that what's actually happening is that some network issue is causing the connection with Azure DevOps to error out during the /gvfs/config request. In the Git traces, we see the following error during this request:

(curl:56) Failure when receiving data from the peer [transient]

This isn't 100% of the time, but has increased enough to cause problems for a variety of users.

The solution being proposed in this pull request is to remove the fall-back mechanism and instead have an explicit choice to use the GVFS protocol. To avoid significant disruption to Azure DevOps customers (the vast majority of microsoft/git users who use scalar clone based on my understanding), I added some inferring of a default value from the clone URL.

This fallback mechanism was first implemented in the C# version of Scalar in microsoft/scalar#339. This was an attempt to make the Scalar client interesting to non-Azure DevOps customers, especially as GitHub was about to launch the availability of partial clones. Now that the scalar client is available upstream, users don't need the GVFS-enabled version to get these benefits.

In addition, this will resolve #384 since those requests won't happen against non-ADO URLs unless requested.

@derrickstolee derrickstolee self-assigned this May 1, 2024
@derrickstolee
Copy link
Author

This is a draft because I'm not sure how to feel about the change of behavior. My commit message is lacking, as well.

I was trying very hard to get this issue diagnosed and fixed before the 2.45.0 release, but I didn't have the necessary information until it reproduced in machines my team owns.

In the Office monorepo, we've recently had an uptick in issues with
`scalar clone`. These issues didn't make sense at first and seemed like
the users weren't using `microsoft/git` but instead the upstream
version's `scalar clone`. Instead of using GVFS cache servers, they were
attempting to use the Git protocol's partial clone (which times out).

It turns out that what's actually happening is that some network issue
is causing the connection with Azure DevOps to error out during the
`/gvfs/config` request. In the Git traces, we see the following error
during this request:

  (curl:56) Failure when receiving data from the peer [transient]

This isn't 100% of the time, but has increased enough to cause problems
for a variety of users.

The solution being proposed in this pull request is to remove the
fall-back mechanism and instead have an explicit choice to use the GVFS
protocol. To avoid significant disruption to Azure DevOps customers (the
vast majority of `microsoft/git` users who use `scalar clone` based on
my understanding), I added some inferring of a default value from the
clone URL.

This fallback mechanism was first implemented in the C# version of
Scalar in microsoft/scalar#339. This was an attempt to make the Scalar
client interesting to non-Azure DevOps customers, especially as GitHub
was about to launch the availability of partial clones. Now that the
`scalar` client is available upstream, users don't need the GVFS-enabled
version to get these benefits.

In addition, this will resolve git#384 since those requests won't happen
against non-ADO URLs unless requested.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee
Copy link
Author

While pairing with one of my engineers, we were able to isolate the reason different machines were having issue with the gvfs/config endpoint: the http.sslBackend setting was different. Something about the Azure DevOps network stack changed recently in a way that the gvfs/config endpoint stopped working with openSSL and switching to schannel works. Users already using schannel in their system config were good, while others had openSSL.

This change is now less of an emergency, because the recent increase in failures is understood. It would still be good to merge this because:

  1. Users who expect the GVFS protocol should not fall back to partial clone.
  2. Users who want to use partial clone against ADO should be able to choose that option.

@dscho dscho merged commit d074acc into microsoft:vfs-2.45.0 May 14, 2024
129 of 137 checks passed
@derrickstolee derrickstolee mentioned this pull request May 14, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scalar clone: make 404s on gvfs/config silent
2 participants