Feature #18: Retry FileDownloads #23

Dtphelan1 · 2023-12-29T21:42:36Z

Having trouble debugging some weird test issues, so I figure I'll make a PR and two-birds-one-stone the problem along with the code review.

Addresses #18 by manually triggering re-downloads of files in the event that FileDownload's streaming requests trigger one of several HTTP status codes in response. Additionally, this PR adds two new configuration values (yay, more config variables) for customizing this retry behavior: fileDownloadMaxRetries and fileDownloadRetryAfterMSec.

To test, modify the bulk-data-server repo to track the number of attempts made to a file download before succeeding. I'd push a branch but I don't have push permissions now so for the moment forgive the awkward git diff formatting:

@@ -292,7 +299,7 @@ class ExportManager
 const ABORT_CONTROLLERS = new Map();
+///////////////////////////////////
+// TEMP
+const TRIES_PER_FILE = new Map();
+// TEMP
+///////////////////////////////////
+
 
@@ -910,6 +917,25 @@ class ExportManager
         const shouldGzip     = (/\bgzip\b/.test(acceptEncoding));
 
+        ///////////////////////////////////
+        // TEMP
+        // Fail on first try
+        const numTries = TRIES_PER_FILE.get(req.params.file.split(".")[1]) ?? 0
+        console.log("NUM TRIES: ", numTries)
+        console.log("req.params.file.split('.')[1]: ", req.params.file.split(".")[1])
+        console.log(TRIES_PER_FILE)
+        TRIES_PER_FILE.set(req.params.file.split(".")[1], numTries + 1)
+        console.log(numTries <= 1)
+        console.log(numTries <= 1)
+        if (numTries <= 1) { 
+            console.log('IN FAIL STATE')
+            return res.set("x-debugging-header", "someValue")
+                .status(500)            
+                .end('')
+        } 
+        // TEMP
+        ///////////////////////////////////
+        
         // set the response headers
         res.set({
             "Content-Type": exportTypes[this.outputFormat].contentType,

Things to consider:

Why are tests failing? Specifically, the attachment-download related test seems to be failing with the addition of this new retry logic. I've been looking at the file for a bit too long and can't seem to puzzle out why that is, so I'm hoping to get another pair of eyes on that.
Do we want to combine fileDownloadRetryAfterMSec with retryAfterMSec?
- Arguments against: the latter is supposed to only be used when bulk server's dont respond with a retry-after header, we don't want to make the responsibility of this variable ambiguous. Plus, the rate at which we retry for status updates might resonably differ from the rate at which we retry failed requests.
- Arguments for: aren't we adding too many config variables? The meaning of this variable is pretty similar across both instances, and we probably don't have too many use cases where retrying wait time for status updates is too far removed
We could get rid of these specific variables altogether and instead expect users to configure the request option directly.
- Argument for: Minimizes the # of config variables. Improves consistency between retry's handled manually by us and automatically by Got.
- Argument against: It's a little awkward to expect users of the client to understand the config logic of Got. Additionally, Got doesn't have a concept of a fixed retryAfterMSec – it supports a calculateDelay function. Our config's are .js files, so users could define them inline, but it does feel cumbersome to have users specify a delay fn instead of a number.

vlad-ignatov · 2024-01-03T15:12:04Z

Now that we know (hopefully) why the test was failing, we know that this is doable. Unfortunately, this is messy and would require a long discussion and some strategic decisions, so lets begin that by splitting the problem in multiple topics and try to agree on them. Here is the first one:

The client makes several types of requests:

Authorization requests to get (or renew) a token - Do we need to (does it make sense to) retry?
Bulk Data Kick-off - Do we need to (does it make sense to) retry?
Status polling - built-in retry logic following the bulk data spec. Not to be confused with what we are trying to do here!
Bulk Data download (as stream) - Use custom retry because got's retry does not work on streams
Attachments download - Use got's built-in retry capabilities

My intuition here is to:

Not retry the kick-off because it is the first request in the sequence and can be reviewed and retried manually if needed.
Same for the token request. Perhaps any error should be fatal in this case.

What do you think?

Dtphelan1 · 2024-01-03T15:25:50Z

I agree with all your intuitions! To repeat:

(No Action) Authorization requests to get (or renew) a token - if there is an error it should probably be fatal. Any errors that would be related to token-expiration should be handled by the BulkDataClient.ts itself, since any this.request call invokes this.getAccessToken, which requests a new token if the current one is expired.
(No Action) Bulk Data Kick-off - I agree, this can be done manually by the user if there is an error.
(No Action) Status polling - the built-in retry logic following the bulk data spec is all we need here.
(Already Done 🤞) Bulk Data download (as stream) - Agreed and relieved that you too see the issues with got's stream-retry logic. Sad though – it would be nice to take advantage of if it worked.
(To-Fix) Attachments download - Use got's built-in retry capabilities, which aren't working as expected right now.

Now that we agree on which requests should actually retry, did you have next steps in mind?

vlad-ignatov · 2024-01-03T15:58:42Z

OK, Yes, multiple next steps :) Next question:

We have 2 separate retries to handle. Let's call them got and custom for simplicity. The user should not have to know that these are different things. Ideally they should be configured in a single place and behave the same.

Option 1 - Use requests.retry for config that is also used by the custom retry method
Option 2 - Use custom config variables which are also used to override requests.retry variables

If we chose option 1, it means that we have to:

TODO: Override retry.limit to 1 for kick-off and auth requests
TODO: Explain in comments and docs that retry options are ignored for kick-off and auth requests, even though the requests config variable is supposed to affect every request.
Make sure the custom retry uses settings from requests.retry (which should be easily doable)

If we chose option 2, it means that we have to:

Duplicate settings that are otherwise provided in requests.retry
Make sure our custom settings override requests.retry wherever it makes sense

What do you think?

P.S.:

We probably don't need to provide custom calculateDelay (and fileDownloadRetryAfterMSec) and can use the one from got. Since we will only retry the file downloads, the single most useful variable that we need is retry.limit, and the built-in backoff from got should be good enough.
Given the previous comment, regardless of the chosen option we may have to add some code to override/ignore retry settings passed to kick-off and auth.

Dtphelan1 · 2024-01-03T16:37:26Z

Personally, I lean towards Option 2 because it's more direct – if people want to retry file downloads, they can read through the options and README and find the fileDownloadMaxRetries option (or fileDownloadRetries, I'm open to renaming it). If we were enabling wider retry logic across other types of requests, I'd lean towards Option 1 since we'd be affecting all requests, but we're really honing in on retrying in two limited scenarios. Pointing people to a generic retry option, only to explain that it only affects fileDownload requests, seems confusing… What do you think?

If we do go with Option 2, I'll make sure that the order of arguments are such that fileDownloadRetries overrides other retry options.

In either case, as far as not having access to calculateDelay, I'm sad but I agree it's not worth the effort right now. If users want to use a long delay, we can encourage them to set retries high; the built-in exponential backoff should help with that.

vlad-ignatov · 2024-01-03T17:06:37Z

OK, I am not a big fan of either option and it is quite hard to decide, so let's skip it for now and hope that it will become clearer while doing the rest. The next one should be simple.

Even though is not configured this way by default, one might need to be able to retry successful requests. For example retry on 304 (not modified) until you get 200. I know you don't exactly support such config now, but that is a question for another step. That said, can we agree that this makes sense and can you move https://github.com/smart-on-fhir/bulk-data-client/blob/dtp/18-manual-retry/src/lib/FileDownload.ts#L152-L159 outside (before) its containing if statement if (res.statusCode >= 400) {?

Also, do we need to wait to call downloadRequest.destroy() or can't we do that immediately?

Dtphelan1 · 2024-01-03T17:11:56Z

I agree, we can move the manual shouldRetry check outside of the wrapping if-statement. Good catch, I'll do that now.

I am not a big fan of either option and it is quite hard to decide

I could not agree more ^ 😆

…tead of specific variables; TODO determine calculateDelay approach

README.md

src/lib/BulkDataClient.ts

…nition of retry object in fileDownload

config/defaults.js

…e definitions

…oaded test/tmp/config

index.d.ts

src/lib/FileDownload.ts

vlad-ignatov · 2024-01-05T14:44:00Z

Thanks Dylan! I tested it and it seems to work fine!

Dtphelan1 added 10 commits December 19, 2023 09:14

--wip-- [skip ci]

984d282

--wip-- [skip ci]

7e91565

--wip-- [skip ci]

b18367e

working manual approach with some failing tests

b803796

revert config file changes

6365252

ignore vscode changes

93d9520

updated readme

e18404b

one more small fix to config file

359508e

updated test to show expected logic

bd71a39

fix test 1

88aebd7

Dtphelan1 requested a review from vlad-ignatov December 29, 2023 21:42

Dtphelan1 added 2 commits January 2, 2024 11:26

add debug example to readme

3a1292d

add debug example to readme

20a1345

Dtphelan1 added 3 commits January 3, 2024 12:35

--wip-- [skip ci]

37950c1

--wip-- [skip ci]

037d8cd

updated readme and tests; using a fileDownloadRetry in config now ins…

f898eb9

…tead of specific variables; TODO determine calculateDelay approach

vlad-ignatov reviewed Jan 4, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

vlad-ignatov reviewed Jan 4, 2024

View reviewed changes

src/lib/BulkDataClient.ts Outdated Show resolved Hide resolved

Dtphelan1 added 2 commits January 4, 2024 09:19

remove unnecessary fileDownloadDelay import

5694cf9

fileDownloadREtry type should be object in readme; improved type defi…

f92e458

…nition of retry object in fileDownload

vlad-ignatov reviewed Jan 4, 2024

View reviewed changes

config/defaults.js Outdated Show resolved Hide resolved

Dtphelan1 added 3 commits January 4, 2024 10:29

change default, update comment

9e84966

new name for fileDownloadRetry; logic mirrors GOT method; updated typ…

6d244c7

…e definitions

minor readme changes and changes to fileDownloadRetry docstring

e60c93f

Dtphelan1 added 6 commits January 4, 2024 14:39

updates to comment re: attachment retrying

e6ec5ee

removed ts-ignore from calculateRetryDelay function

149aaaf

update comment re fileDownloadRetry in config; remove erroneously upl…

921a62d

…oaded test/tmp/config

remove unnecessary import of RetryObject in utils

f8a6393

remove unnecessary status code change in test

2656dd3

add comment for attachment-error case

f7dd5b8

vlad-ignatov requested changes Jan 4, 2024

View reviewed changes

index.d.ts Show resolved Hide resolved

src/lib/FileDownload.ts Outdated Show resolved Hide resolved

Dtphelan1 added 2 commits January 4, 2024 19:57

resolve Method type

3809ff6

chain promises with resolve/reject properly

fe57065

vlad-ignatov approved these changes Jan 5, 2024

View reviewed changes

Dtphelan1 merged commit b2a14a7 into main Jan 5, 2024

vlad-ignatov deleted the dtp/18-manual-retry branch February 7, 2025 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature #18: Retry FileDownloads #23

Feature #18: Retry FileDownloads #23

Dtphelan1 commented Dec 29, 2023 •

edited

Loading

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 5, 2024

Feature #18: Retry FileDownloads #23

Feature #18: Retry FileDownloads #23

Conversation

Dtphelan1 commented Dec 29, 2023 • edited Loading

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 3, 2024

Dtphelan1 commented Jan 3, 2024

vlad-ignatov commented Jan 5, 2024

Dtphelan1 commented Dec 29, 2023 •

edited

Loading