-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename lhmn_ tables to lhma_ to avoid IBP stalls #41
Conversation
The amount of behavior implemented in the base Lhm module was excessive. The code written there was intentionally made terse to try and limit the amount of code written there. By extracting it to it's own class, we can be more expressive, which will make future refactoring easier.
We're about to change the behavior of the current cleanup and I'd like to have more explicit tests about exactly what will be executed.
In the next commit, we'll need to be able to generate timestamps so let's extract this logic first.
When an LHM worker fails, `cleanup_current_run` must be called to remove the triggers and "new" tables (which start with lhmn_). The previous behavior was to drop the table immediately. However, if this is an active table, the InnoDB buffer pool can be full of pages related to this "lhmn_" table. When it is dropped, this forces IBP to clear to those pages and can cause MySQL to become unresponsive. By instead renaming this table with the archive prefix (lhma_) when can let the buffer unload relevant pages overtime, and then later, safely, drop the archive tables as part of regular scheduled maintenance.
7326422
to
a9a4349
Compare
|
||
def all_triggers_for_origin | ||
@all_triggers_for_origin ||= connection.select_values("show triggers like '%#{origin_table_name}'").collect do |trigger| | ||
trigger.respond_to?(:trigger) ? trigger.trigger : trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this isn't your code but wtf? I wonder if this is Mysql
vs Mysql2
stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's likely the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is exactly what it is, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordanwheeler do you happen to know which is the mysql2
syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to follow commit by commit so I can see how the factoring took shape. 🚢
@time = time | ||
end | ||
|
||
def to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty fancy there guy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd feel better if the timestamp stuff was tested better, but it's the same code which you haven't actually changed, and testing it nicely there would likely require timecop or something, which is a lot of effort for such a small change.
i just thought i'd mention that. i like what you've done here 👍
When an LHM worker fails,
cleanup_current_run
must be called to removethe triggers and "new" tables (which start with lhmn_). The previous
behavior was to drop the table immediately. However, if this is an active
table, the InnoDB buffer pool can be full of pages related to this "lhmn_"
table. When it is dropped, this forces IBP to clear to those pages and can
cause MySQL to become unresponsive.
By instead renaming this table with the archive prefix (lhma_) when can let
the buffer unload relevant pages overtime, and then later, safely, drop the
archive tables as part of regular scheduled maintenance.