Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog: Optimize WatchdogImpl::touch in preparation to more frequent petting of the watchdog. #13103

Merged
merged 17 commits into from
Oct 20, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions source/server/guarddog_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ GuardDogImpl::GuardDogImpl(Stats::Scope& stats_scope, const Server::Configuratio
GuardDogImpl::~GuardDogImpl() { stop(); }

void GuardDogImpl::step() {
// Hold mutex_ for the duration of the step() function to ensure that watchdog liveness checks and
// test interlock operations happen in the expected order. Calls to forceCheckForTest() should
// result in a full iteration of the step() function to process recent watchdog touches and
// monotonic time changes.
Thread::LockGuard guard(mutex_);
if (!run_thread_) {
return;
Expand All @@ -103,9 +107,8 @@ void GuardDogImpl::step() {
static_cast<size_t>(ceil(multi_kill_fraction_ * watched_dogs_.size())));

for (auto& watched_dog : watched_dogs_) {
if (watched_dog->dog_->touchCount() > 0) {
// Watchdog was touched since the guard dog last checked; update time and reset the count.
watched_dog->dog_->resetTouchCount();
if (watched_dog->dog_->getTouchedAndReset()) {
// Watchdog was touched since the guard dog last checked; update last check-in time.
watched_dog->last_checkin_ = now;
continue;
}
Expand Down
5 changes: 2 additions & 3 deletions source/server/guarddog_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,8 @@ class GuardDogImpl : public GuardDog {
const std::chrono::milliseconds loopIntervalForTest() const { return loop_interval_; }

/**
* Test hook to force a step() to catch up with the current simulated
* time. This is inlined so that it does not need to be present in the
* production binary.
* Test hook to force a step() to catch up with the current watchdog state and simulated time.
* This is inlined so that it does not need to be present in the production binary.
*/
void forceCheckForTest() {
Thread::LockGuard guard(mutex_);
Expand Down
17 changes: 9 additions & 8 deletions source/server/watchdog_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,21 @@ class WatchDogImpl : public WatchDog {
: thread_id_(thread_id), timer_interval_(interval) {}

Thread::ThreadId threadId() const override { return thread_id_; }
// Accessors for use by GuardDogImpl to tell if the watchdog was touched recently and reset the
// count back to 0 when it has.
uint32_t touchCount() const { return touch_count_.load(); }
void resetTouchCount() { touch_count_.store(0); }
// Used by GuardDogImpl determine if the watchdog was touched recently and reset the touch status.
bool getTouchedAndReset() { return touched_.exchange(false, std::memory_order_acq_rel); }

// Server::WatchDog
void startWatchdog(Event::Dispatcher& dispatcher) override;
void touch() override { touch_count_.fetch_add(1); }
void touch() override {
// Set touched_ if not already set.
bool expected = false;
touched_.compare_exchange_strong(expected, true, std::memory_order_release,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an atomics expert, but I'm pretty sure you can use store with memory_order_relaxed on this one, and use memory_order_relaxed on the exchange above.

The whole thing is eventually consistent so I'm not sure any ordering matters and that would probably be faster?

The only thing I'm not sure of is whether this would effect test determinism, however, if you are using locks to serialize the tests that should force consistency of all previous operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert either. After thinking about it some more, I think you're totally right - relaxed memory order is sufficient here. Switched to that.

std::memory_order_acquire);
}

private:
const Thread::ThreadId thread_id_;
// Use a 32-bit atomic counter since 64-bit atomics are not generally available in 32-bit ARM and
// 32-bit is large enough for our purposes.
std::atomic<uint32_t> touch_count_ = 0;
std::atomic<bool> touched_{false};
Event::TimerPtr timer_;
const std::chrono::milliseconds timer_interval_;
};
Expand Down