Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics #208

Conversation

jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Oct 15, 2019

I'm marking this WIP because I haven't done a cleanup round nor squashed things yet.
But I want to get the CI builds a chance to run tonight.

This series attempts to:
[x] auto-retry after network outages
[x] throttle back when requested (or demanded) by the server.

Questions:
[done] What is the right default for the network retry limit?
[done] Should the throttle back have a time limit? (It's one thing to wait 3 or 4 minutes between
packfiles because we hit it too hard, but another if it says we should wait an hour or two.)
[no] Should the network retry look at the amount of data received and try to resume it?
[no] Should the network retry split large packfile requests if we can tell the user's network is flakey?

Basic testing with 5 concurrent fetches shows that it is pretty easy to get throttled and
makes me wonder if we should even bother with multi-threading this. Perhaps we just
limit it to waiting for index-pack in another thread, but only plan to have 1 network thread.
Or maybe that is just when talking to the main server -- we might be able to multithread
when talking to the cache-server.

@wilbaker
Copy link
Member

wilbaker commented Oct 16, 2019

What is the right default for the network retry limit?

We might decide we no longer want this approach, but the approach that's been used by VFS4G is:

@jeffhostetler
Copy link
Author

@wilbaker Thanks for the pointers.

derrickstolee added a commit to microsoft/scalar that referenced this pull request Oct 17, 2019
See microsoft/git#208 for the Git changes. This should make the gvfs-helper more robust to intermittent failures when a single network call fails.
@derrickstolee
Copy link

/azp run Microsoft.git

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@derrickstolee
Copy link

@jeffhostetler after the 503 update, I was unable to repro any network failures in the C# functional tests. (I got a different flaky failure in the C# code, but that's different ;) )

I'm happy to re-test and approve after you do the cleanup to make this not WIP.

@jeffhostetler
Copy link
Author

Thanks for the confirmation! Almost finished....

@jeffhostetler
Copy link
Author

/azp run Microsoft.git

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Add robust-retry mechanism to automatically retry a request after network
errors.  This includes retry after:
   [] transient network problems reported by CURL.
   [] http 429 throttling (with associated Retry-After)
   [] http 503 server unavailable (with associated Retry-After)

Add voluntary throttling using Azure X-RateLimit-* hints to avoid being
soft-throttled (tarpitted) or hard-throttled (429) on later requests.

Add global (outside of a single request) azure-throttle data to track the
rate limit hints from the cache-server and main Git server independently.

Add exponential retry backoff.  This is used for transient network problems
when we don't have a Retry-After hint.

Move the call to index-pack earlier in the response/error handling sequence
so that if we receive a 200 but yet the packfile is truncated/corrupted, we
can use the regular retry logic to get it again.

Refactor the way we create tempfiles for packfiles to use
<odb>/pack/tempPacks/ rather than working directly in the <odb>/pack/
directory.

Move the code to create a new tempfile to the start of a single request
attempt (initial and retry attempts), rather than at the overall start
of a request.  This gives us a fresh tempfile for each network request
attempt.  This simplifies the retry mechanism and isolates us from the file
ownership issues hidden within the tempfile class.  And avoids the need to
truncate previous incomplete results.  This was necessary because index-pack
was pulled into the retry loop.

Minor: Add support for logging X-VSS-E2EID to telemetry on network errors.

Minor: rename variable:
    params.b_no_cache_server --> params.b_permit_cache_server_if_defined.
This variable is used to indicate whether we should try to use the
cache-server when it is defined.  Got rid of double-negative logic.

Minor: rename variable:
    params.label --> params.tr2_label
Clarify that this variable is only used with trace2 logging.

Minor: Move the code to automatically map cache-server 400 responses
to normal 401 response earlier in the response/error handling sequence
to simplify later retry logic.

Minor: Decorate trace2 messages with "(cs)" or "(main)" to identify the
server in log messages.  Add params->server_type to simplify this.

Signed-off-by: Jeff Hostetler <[email protected]>
@jeffhostetler jeffhostetler force-pushed the users/jeffhost/gvfs-helper-robust-retry-take2 branch 2 times, most recently from 9a844ff to 0519894 Compare October 22, 2019 14:56
}

/*
* Get exactly 1 object immediately.
* Ignore any queued objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are assuming that any queued objects will get a flush request eventually? That sounds reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've split the queued and immediate usage now. The dry-run/pre-scan loops already handle the queue and drain (for missing blobs usually). The main difference now is that any missing trees (or commits) during those loops will be immediately fetched in isolation, but the queue will remain.

Expose the differences in the semantics of GET and POST for
the "gvfs/objects" API:

    HTTP GET: fetches a single loose object over the network.
              When a commit object is requested, it just returns
	      the single object.

    HTTP POST: fetches a batch of objects over the network.
               When the oid-set contains a commit object, all
	       referenced trees are also included in the response.

gvfs-helper is updated to take "get" and "post" command line options.
the gvfs-helper "server" mode is updated to take "objects.get" and
"objects.post" verbs.

For convenience, the "get" option and the "objects.get" verb
do allow more than one object to be requested.  gvfs-helper will
automatically issue a series of (single object) HTTP GET requests
and creating a series of loose objects.

The "post" option and the "objects.post" verb will perform bulk
object fetching using the batch-size chunking.  Individual HTTP
POST requests containing more than one object will be created
as a packfile.  A HTTP POST for a single object will create a
loose object.

This commit also contains some refactoring to eliminate the
assumption that POST is always associated with packfiles.

In gvfs-helper-client.c, gh_client__get_immediate() now uses the
"objects.get" verb and ignores any currently queued objects.

In gvfs-helper-client.c, the OIDSET built by gh_client__queue_oid()
is only processed when gh_client__drain_queue() is called.  The queue
is processed using the "object.post" verb.

Signed-off-by: Jeff Hostetler <[email protected]>
@jeffhostetler jeffhostetler force-pushed the users/jeffhost/gvfs-helper-robust-retry-take2 branch from a5d67f2 to 5c65e9a Compare October 23, 2019 17:21
@jeffhostetler
Copy link
Author

@derrickstolee I think I'm done tinkering with this one. 2.20191023.1 has already been thru the functional tests and is good. Just did a final squash on my fixups and am running 2.20191023.2 thru the its paces.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rebase these commits onto tentative/features/sparse-checkout-2.24.0 for #214 after you merge.

@jeffhostetler jeffhostetler changed the title WIP auto-retry after network errors and resource throttling gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics Oct 23, 2019
@jeffhostetler jeffhostetler merged commit a782a7e into microsoft:features/sparse-checkout-2.23.0 Oct 23, 2019
@jeffhostetler
Copy link
Author

Thanks for all your help!

derrickstolee added a commit to microsoft/scalar that referenced this pull request Oct 24, 2019
…out-2.23.0

Upgrade to 2.20191023.7-sc which corresponds to commit microsoft/git@a782a7e and includes the following changes (since 2.20191015.2-sc):

- microsoft/git#208
- microsoft/git#210
derrickstolee added a commit to microsoft/scalar that referenced this pull request Oct 25, 2019
Resolves #195.

Includes the following updates to `microsoft/git`:

* microsoft/git#208: gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics.
* microsoft/git#215: gvfs-helper: dramatically reduce progress noise.
derrickstolee pushed a commit that referenced this pull request Jun 1, 2020
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
vdye pushed a commit that referenced this pull request Feb 27, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Apr 23, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Apr 23, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Apr 23, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Apr 24, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Apr 29, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request May 14, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request May 14, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jun 3, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jul 17, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jul 17, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jul 17, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jul 18, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
mjcheetham pushed a commit that referenced this pull request Jul 23, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jul 25, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
mjcheetham pushed a commit that referenced this pull request Jul 29, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Sep 18, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Sep 24, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Oct 8, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
mjcheetham pushed a commit that referenced this pull request Dec 3, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Dec 17, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Dec 18, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Dec 18, 2024
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jan 1, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jan 1, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jan 1, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jan 1, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Jan 1, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Feb 10, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Feb 27, 2025
Includes commits from these pull requests:

	#191
	#205
	#206
	#207
	#208
	#215
	#220
	#221

Signed-off-by: Derrick Stolee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants