Retry policies don't sleep after operations time out #18548

chlowell · 2021-05-06T18:47:41Z

After an operation times out, _configure_timeout raises a subclass of AzureError. Async/RetryPolicy handles AzureError by sleeping and continuing the retry_active loop. The policy won't actually send another request, because _configure_timeout will raise at the start of every loop iteration, but it will pointlessly sleep after each iteration until it exhausts its allowed retries.

This PR prevents this pointless sleeping by re-raising _configure_timeout exceptions. This is safe today because no other policy raises those exceptions. For example, transports raise ServiceRequest/ResponseError when connections time out. However, in principle something downstream from RetryPolicy could raise e.g. ServiceResponseTimeoutError given a request that should be retried; with this change, the request would not be retried. I don't expect that scenario, but I point out the possibility so you can disagree with me as you like 😸

sdk/core/azure-core/azure/core/pipeline/policies/_retry.py

xiangyan99 · 2021-05-06T19:42:54Z

sdk/core/azure-core/azure/core/pipeline/policies/_retry_async.py

@@ -156,6 +158,9 @@ async def send(self, request):  # pylint:disable=invalid-overridden-method
                # the authentication policy failed such that the client's request can't
                # succeed--we'll never have a response to it, so propagate the exception
                raise
+            except (ServiceRequestTimeoutError, ServiceResponseTimeoutError):


Given this scenario, timeout setting is 10 min and default timeout is 5 min.

We should fail on the second TimeoutError but not the first one, right?

I don't understand what you mean by a first vs. second error. _configure_timeout raises one of these errors when absolute_timeout <= 0. I believe we never want to retry when that's true.

I meant (ServiceRequestTimeoutError, ServiceResponseTimeoutError) might be raised from transport. In this case, absolute_timeout may still >0 and we do want to retry.

Nothing else in the library raises these errors today but it's a fair point that a transport or policy could do so. I've removed this except block in favor of checking absolute_timeout in the broader handler.

xiangyan99 · 2021-05-24T21:01:06Z

sdk/core/azure-core/azure/core/pipeline/policies/_retry.py

@@ -455,7 +455,7 @@ def send(self, request):
                # succeed--we'll never have a response to it, so propagate the exception
                raise
            except AzureError as err:
-                if self._is_method_retryable(retry_settings, request.http_request):
+                if absolute_timeout > 0 and self._is_method_retryable(retry_settings, request.http_request):


If absolute_timeout <= 0, we already raised in https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/core/azure-core/azure/core/pipeline/policies/_retry.py#L295?

Not true?

True, but the problem is that we handle that error here by sleeping and iterating again.

I might mis-understand, why absolute_timeout can be <= 0 here?

Do you want to move

end_time = time.time() if absolute_timeout: absolute_timeout -= (end_time - start_time)

here?

This code is tricky to refactor without breaking something. For now we need to decrement absolute_timeout in the finally block so it's done after every iteration, including ones that end with the continue in the try block. I'm picking low hanging fruit in this PR. I think fixing everything (e.g. #18552) will require more thought and a larger change.

As for how absolute_timeout can be non-positive here, note that nothing breaks the loop when it is. _configure_timeout raises an exception when absolute_timeout <= 0 but this except block handles that exception because it's an AzureError. If the request is retryable, the except block sleeps and we go back to the try block, where _configure_timeout raises again. Repeat until retries are exhausted.

xiangyan99

LGTM.

Please update changelog. :)

Review request for Microsoft.ContainerService to add version 2022-03-02-preview (Azure#18633) * Adds base for updating Microsoft.ContainerService from version preview/2022-02-02-preview to version 2022-03-02-preview * Updates readme * Updates API version in new specs and examples * feat: add `/rotateServiceAccountSigningKeys` API (Azure#18359) * feat: add `/rotateServiceAccountSigningKeys` API * chore: remove duplicated description * feat: add `workloadIdentity` settings to `SecurityProfile` (Azure#18360) * fix: update example name (Azure#18382) * add creationData to mc data (Azure#18414) * add creationData to mc data * fix test * fix format * Add API properties and example JSON for Web App Routing of IngressProfile (Azure#18564) * Add API properties and example JSON for Web App Routing of IngressProfile. * Add a ending period for description to match the style in all other "descriptions" in the same file. * Update readmes for the 2022-03-02-preview dev branch of container service (Azure#18358) * update readme * update sdk readmes * Agentpool alias minor version 2022-03-02-preview (Azure#18381) * Add field currentOrchestratorVersion to support Agentpool alias minor version * Add new exmaple for alias minor version * Remove example for PrettierCheck * Fix typo * Latest patch version supported is 1.22.7 at the moment * Address comments - refine descriptions for fields * feat: add ManagedCluster StorageProfile in 0302preview (Azure#18590) Signed-off-by: Ji An Liu <[email protected]> * Swagger change for ignore-pod-disruption-budget (Azure#18548) * Swagger change for ignore-pod-disruption-budget * Change ignorePodDisruptionBudget to string in example file. * Change ignorePodDisruptionBudget to boolean type. * add effectiveNoProxy for AKS (Azure#18544) * add effectiveNoProxy for AKS * add to correct api * fix lowercase -> uppercase O * spelling * Replace common type definitions with references since 2022-03-02-preview for Microsoft.ContainerService (Azure#18567) * replace Resource * replace SystemData * replace parameters * change track2 to python Co-authored-by: hbc <[email protected]> Co-authored-by: Qingqing <[email protected]> Co-authored-by: Yi Zhang <[email protected]> Co-authored-by: Thalia Wang <[email protected]> Co-authored-by: Ji'an Liu <[email protected]> Co-authored-by: Tong Chen <[email protected]> Co-authored-by: Ace Eldeib <[email protected]>

chlowell added 2 commits May 6, 2021 11:24

retry policy doesn't sleep after operation times out

dc5d78b

add tests

b43631b

chlowell added the Azure.Core label May 6, 2021

chlowell requested review from lmazuel and xiangyan99 as code owners May 6, 2021 18:47

chlowell commented May 6, 2021

View reviewed changes

sdk/core/azure-core/azure/core/pipeline/policies/_retry.py Outdated Show resolved Hide resolved

xiangyan99 reviewed May 6, 2021

View reviewed changes

chlowell mentioned this pull request May 6, 2021

allow configuring max timeout for get_secret function in azure-keyvault-secrets #17470

Closed

chlowell requested a review from xiangyan99 May 12, 2021 15:55

check absolute_timeout when handling AzureError

3b07f71

xiangyan99 reviewed May 24, 2021

View reviewed changes

xiangyan99 approved these changes May 25, 2021

View reviewed changes

chlowell merged commit f9aa2ea into Azure:master May 25, 2021

chlowell deleted the no-sleep-after-timeout branch May 25, 2021 15:26

rakshith91 pushed a commit to rakshith91/azure-sdk-for-python that referenced this pull request Jun 1, 2021

Retry policies don't sleep after operations time out (Azure#18548)

12c1ba3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry policies don't sleep after operations time out #18548

Retry policies don't sleep after operations time out #18548

chlowell commented May 6, 2021 •

edited

Loading

xiangyan99 May 6, 2021 •

edited

Loading

chlowell May 6, 2021

xiangyan99 May 12, 2021

chlowell May 12, 2021

xiangyan99 May 24, 2021

chlowell May 24, 2021

xiangyan99 May 24, 2021

chlowell May 24, 2021

xiangyan99 left a comment

Retry policies don't sleep after operations time out #18548

Retry policies don't sleep after operations time out #18548

Conversation

chlowell commented May 6, 2021 • edited Loading

xiangyan99 May 6, 2021 • edited Loading

Choose a reason for hiding this comment

chlowell May 6, 2021

Choose a reason for hiding this comment

xiangyan99 May 12, 2021

Choose a reason for hiding this comment

chlowell May 12, 2021

Choose a reason for hiding this comment

xiangyan99 May 24, 2021

Choose a reason for hiding this comment

chlowell May 24, 2021

Choose a reason for hiding this comment

xiangyan99 May 24, 2021

Choose a reason for hiding this comment

chlowell May 24, 2021

Choose a reason for hiding this comment

xiangyan99 left a comment

Choose a reason for hiding this comment

chlowell commented May 6, 2021 •

edited

Loading

xiangyan99 May 6, 2021 •

edited

Loading