Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TARDIS-3783-Analyze and Improve Short Polling Implementation #16

Merged
merged 1 commit into from
Sep 6, 2019

Conversation

mhamzak008
Copy link
Contributor

No description provided.

@mhamzak008 mhamzak008 requested a review from zfr July 19, 2019 10:26
@@ -77,6 +75,11 @@ def __init__(self, request_id=None, took=0.0, result=None, data=None, url=None):
else:
self.url = None

if api_client is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I am gonna update the pull request

@mhamzak008 mhamzak008 requested a review from bkaganyildiz July 19, 2019 10:42
@mhamzak008 mhamzak008 force-pushed the TARDIS-3783-ImproveShortPolling branch from c6a543a to 3d69f08 Compare July 19, 2019 10:42

while condition:
if attempt_count > 0:
time.sleep(2 * attempt_count * 0.2)
sleep_time = random.uniform(0, (0.2 * 2 ** attempt_count))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So polling delay is now starting at 0.1 and doubling on every attempt, with a uniformly distributed jitter of +/-100%

Combined with the short_polling_max_retries default value of 10 (and assuming the off-by-one bug I noted gets fixed) that gives an expected maximum aggregate sleep time of 51 seconds, and a worst-case of 102 seconds.

This plus any delays in calling get_[incident_]request_status would be the maximum blocking time for any client code calling retrieve_result(), directly or indirectly.

Seems reasonable...


attempt_count += 1

if should_retry and attempt_count < MAX_NUMBER_OF_RETRIES:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one error: If incrementing attempt_count before this comparison it should be attempt_count <= MAX_NUMBER_OF_RETRIES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will fix it right away.

@@ -263,6 +263,7 @@ def request(self, method, url, query_params=None, headers=None,
exception = exceptions.build_exception(response=r)
raise exception

self.retries = -1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking retry count here seems a bit hacky and error-prone. You've fixed one problem here but I think there are more lurking. For example, what happens when all retries are exhausted? This method raises and doesn't reset retries to -1 (and it has no way to know that retries have been exhausted in order to do the reset). So I think retry_count is going to be wrong until after the next successful request?

Seem like retry counts should be maintained at the level where the retrying is being done, and passed in somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree. In this particular pull request, I was more focused towards improving the short polling implementation but then I realized that the retry_count wasn't being reset in the previous implementation. So, from the top of my head, I just reset it there. I am gonna dig a little deeper into it. Thanks for the heads up :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further investigation, I realized that this particular retry_count is only used for metrics' publishing and not for the actual retry policy of the SDK. The number of times the SDK retries an action is managed by the retry library and is independent of this variable. I have added this information to an already existent internal issue about improving the retry logic of the SDK and hopefully the implementation will be improved very soon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixed in the following pull request: #17

@mhamzak008 mhamzak008 force-pushed the TARDIS-3783-ImproveShortPolling branch from 3d69f08 to de10f77 Compare July 20, 2019 09:45
@mhamzak008 mhamzak008 force-pushed the TARDIS-3783-ImproveShortPolling branch from de10f77 to 601ec75 Compare July 26, 2019 14:43
@zfr zfr merged commit d0d981a into master Sep 6, 2019
@zfr zfr deleted the TARDIS-3783-ImproveShortPolling branch September 6, 2019 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants