-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TARDIS-3783-Analyze and Improve Short Polling Implementation #16
Conversation
@@ -77,6 +75,11 @@ def __init__(self, request_id=None, took=0.0, result=None, data=None, url=None): | |||
else: | |||
self.url = None | |||
|
|||
if api_client is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I am gonna update the pull request
c6a543a
to
3d69f08
Compare
|
||
while condition: | ||
if attempt_count > 0: | ||
time.sleep(2 * attempt_count * 0.2) | ||
sleep_time = random.uniform(0, (0.2 * 2 ** attempt_count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So polling delay is now starting at 0.1 and doubling on every attempt, with a uniformly distributed jitter of +/-100%
Combined with the short_polling_max_retries default value of 10 (and assuming the off-by-one bug I noted gets fixed) that gives an expected maximum aggregate sleep time of 51 seconds, and a worst-case of 102 seconds.
This plus any delays in calling get_[incident_]request_status would be the maximum blocking time for any client code calling retrieve_result()
, directly or indirectly.
Seems reasonable...
|
||
attempt_count += 1 | ||
|
||
if should_retry and attempt_count < MAX_NUMBER_OF_RETRIES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-by-one error: If incrementing attempt_count
before this comparison it should be attempt_count <= MAX_NUMBER_OF_RETRIES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I will fix it right away.
@@ -263,6 +263,7 @@ def request(self, method, url, query_params=None, headers=None, | |||
exception = exceptions.build_exception(response=r) | |||
raise exception | |||
|
|||
self.retries = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking retry count here seems a bit hacky and error-prone. You've fixed one problem here but I think there are more lurking. For example, what happens when all retries are exhausted? This method raises and doesn't reset retries to -1 (and it has no way to know that retries have been exhausted in order to do the reset). So I think retry_count
is going to be wrong until after the next successful request?
Seem like retry counts should be maintained at the level where the retrying is being done, and passed in somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree. In this particular pull request, I was more focused towards improving the short polling implementation but then I realized that the retry_count
wasn't being reset in the previous implementation. So, from the top of my head, I just reset it there. I am gonna dig a little deeper into it. Thanks for the heads up :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further investigation, I realized that this particular retry_count
is only used for metrics' publishing and not for the actual retry policy of the SDK. The number of times the SDK retries an action is managed by the retry library and is independent of this variable. I have added this information to an already existent internal issue about improving the retry logic of the SDK and hopefully the implementation will be improved very soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed in the following pull request: #17
3d69f08
to
de10f77
Compare
de10f77
to
601ec75
Compare
No description provided.