Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry ChunkInsert and tune retry settings #51

Merged
merged 12 commits into from
Aug 30, 2018
Merged

Conversation

bbuchalter
Copy link

@bbuchalter bbuchalter commented Aug 27, 2018

NOTE: This PR is best reviewed commit by commit.

This PR has three parts:

  • Implements SqlRetry as a replacement for RetryHelper.
  • Update ChunkerInsert, AtomicSwitcher, Entangler to all use SqlRetry.
  • Tune retry settings for all components to:
    • Wait 1 second between retries, up to 10 times per unit work (e.g. each chunk)
    • Retry for deadlocks as well as wait locks

@bbuchalter
Copy link
Author

This PR depends on a feature introduced in retriable in v3.0.0. This dependency requirement needs to be set correctly in the gemspec.

Brian Buchalter added 12 commits August 28, 2018 13:36
This is a re-implementation of the RetryHelper that was introduced
in #39.

By being a class instead of a module, it has a much more clear interface
and requires less setup. It features a true integration test for lock conditions
which were replaced in #20 with mocks.

It's configuration is currently identical to the configuration of RetryHelper.
As we have all components use SqlRetry, we're going to need better logging
to know what is being retried. Copying the whole proc for the retry log
message into each component's configuration is going to be very painful.
Let's add better support for this into SqlRetry.
The retry helper assumes access to a @connection, so it is better to
pass this in when initializing than in a particular method.
After a lengthy conversation an exploration in #44
it's been decided it's best to err on the side of being conservative and retry
only 10 times. This can be adjusted and tuned later.
These are also common causes of the need to manually retry. Let's
automate the retry of these error messages too.
We rely on kamui/retriable#24 with our
default configuration of retriable which was released in 3.0.0.
Copy link

@insom insom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great 👍

I appreciate the comments around the testing of LWTOs in timeouts too -- it's complicated and the comments are short but have the important information in them, so: cool.

base_interval went from having "opinionated defaults" to just being 1s everywhere unless overridden -- do you see us moving that all to Shopify/shopify and Shopify/shopify_lhm in the end? I'm not sure that the default of 1s is great, but I don't want to bike shed here if you expect we'll set this elsewhere.

Also I don't love the

{}.merge!(options.fetch(:something, {}))

pattern, but at the same time -- I don't have a better syntax suggestion so ... 🚢? 😁

@bbuchalter
Copy link
Author

base_interval went from having "opinionated defaults" to just being 1s everywhere unless overridden -- do you see us moving that all to Shopify/shopify and Shopify/shopify_lhm in the end

The problem with this right now is that it can't be overridden by end users of the LHM library. There needs to be external configuration exposed for this at the library level. Right now the only configuration exposed is internally, to make tests not slow. This is a separate issue that I've filed #53.

@bbuchalter bbuchalter merged commit 7f5c1bf into master Aug 30, 2018
@bbuchalter bbuchalter deleted the retry_refactor branch August 30, 2018 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants