Skip to content

Commit

Permalink
Use jitter in waiters and account for overflows
Browse files Browse the repository at this point in the history
This approached adds jitter to waiters and accounts for overflows.
This update ensures that the computed exponential wait time does not
cause integer overflows for larger numbers of attempts. Instead, we
compute at which point attempts will exceed maxAttempts and stop
computing an exponential increase.
  • Loading branch information
mtdowling committed Dec 4, 2020
1 parent 781dc44 commit 40c28ac
Showing 1 changed file with 92 additions and 30 deletions.
122 changes: 92 additions & 30 deletions docs/source/1.0/spec/waiters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,25 +157,36 @@ Waiter retries

Waiter implementations MUST delay for a period of time before attempting a
retry. The amount of time a waiter delays between retries is computed using
`exponential backoff`_ through the following algorithm:
exponential backoff with jitter through the following algorithm:

* Let ``attempt`` be the number of retry attempts.
* Let ``attemptCeiling`` be the computed number of attempts necessary before
``delay`` with exponential backoff exceeds ``maxDelay``. This is necessary
to prevent integer overflows for larger numbers of retries.
* Let ``minDelay`` be the minimum amount of time to delay between retries in
seconds, specified by the ``minDelay`` property of a
:ref:`waiter <waiter-structure>` with a default of 2.
* Let ``maxDelay`` be the maximum amount of time to delay between retries in
seconds, specified by the ``maxDelay`` property of a
:ref:`waiter <waiter-structure>` with a default of 120.
* Let ``min`` be a function that returns the smaller of two integers.
* Let ``max`` be a function that returns the larger of two integers.
* Let ``maxWaitTime`` be the amount of time in seconds a user is willing to
wait for a waiter to complete.
* Let ``remainingTime`` be the amount of seconds remaining before the waiter
has exceeded ``maxWaitTime``.
* Let ``random`` be a function that returns a random value between two
inclusive integers.
* Let ``log`` be a function that returns the natural logarithm for an integer.
* Let ``maxWaitTime`` be a user-provided amount of time in seconds a user is
willing to wait for a waiter to complete.
* Let ``remainingTime`` be the computed amount of seconds remaining before the
waiter has exceeded ``maxWaitTime``.

.. code-block:: python
delay = min(maxDelay, minDelay * 2 ** (attempt - 1))
attemptCeiling = (log(maxDelay / minDelay) / log(2)) + 1
if attempt > attemptCeiling:
delay = maxDelay
else:
delay = minDelay * 2 ** (attempt - 1)
delay = random(minDelay, delay)
if remainingTime - delay <= minDelay:
delay = remainingTime - minDelay
Expand All @@ -187,7 +198,7 @@ needlessly only to exceed ``maxWaitTime`` before issuing a final request.

Using the default ``minDelay`` of 2, the default ``maxDelay`` of 120, a caller
provided ``maxWaitTime`` of 300 (5 minutes), and assuming that requests
complete in 0 seconds (for example purposes only), delays are computed as
complete in 0 seconds (for example purposes only), delays might be computed as
follows:

.. list-table::
Expand All @@ -202,34 +213,85 @@ follows:
- 2
- 298
* - 2
- 4
- 6
- 294
- 3
- 5
- 295
* - 3
- 8
- 14
- 286
- 6
- 11
- 289
* - 4
- 16
- 30
- 270
- 6
- 17
- 283
* - 5
- 32
- 62
- 238
- 22
- 39
- 261
* - 6
- 64
- 126
- 174
- 62
- 101
- 199
* - 7
- 120
- 254
- 46
* - 8 (last attempt)
- 44
- 43
- 144
- 156
* - 8
- 24
- 168
- 132
* - 9
- 71
- 239
- 61
* - 10
- 42
- 281
- 19
* - 11
- 9
- 290
- 10
* - 12
- 6
- 296
- 4
* - 13 (last attempt)
- 2
- 298
- N/A

.. note::

Because waiters use jitter, waiters might use different delays than the
example table above.


Why exponential backoff with jitter?
------------------------------------

`Exponential backoff with full jitter`_ is used as opposed to other retry
strategies like linear backoff because it should work for most use cases,
balancing the cost to the caller spent waiting on a resource to stabilize,
the cost of the service in responding to polling requests, and the overhead
associated with potentially violating a service level agreement and getting
throttled. Waiters that poll for resources that quickly stabilize will
complete within the first few calls, whereas waiters that could take hours
to complete will send fewer requests as the number of retries increases.

By generally increasing the amount of delay between retries as the number of
retry attempts increases, waiters will not overload services with unnecessary
polling calls, and it protects customers from violating service level
agreements that could counter-intuitively cause waiters to take longer to
complete or even fail due to request throttling. By using introducing
randomness with jitter, waiters will retry slightly more aggressively to
improve the time to completion while still maintaining the general increase
in delay between retries.

Note that linear backoff is still possible to configure with waiters. By
setting ``minDelay`` and ``maxDelay`` to the same value, a waiter will retry
using linear backoff.


.. _waiter-structure:

Expand Down Expand Up @@ -796,4 +858,4 @@ the ``StartResource`` API operation.
.. _CommonMark: https://spec.commonmark.org/
.. _JMESPath: https://jmespath.org/
.. _JMESPath types: https://jmespath.org/specification.html#data-types
.. _exponential backoff: https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
.. _Exponential backoff with full jitter: https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/#Jitter

0 comments on commit 40c28ac

Please sign in to comment.