-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request timeout for ValidatorSites: #2902
Conversation
Fixes: RIPD-1737 Enforce a 20s timeout for making validator list requests.
Jenkins Build SummaryBuilt from this commit Built at 20190415 - 20:59:27 Test Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice change! I left a few nits that you can address, or not, as you see fit.
That said, I wish I could say that your change is blindingly, obviously, safe. I, personally, see no problems. However I find that working with timers in asio can be extremely subtle. I looked hard. And, given my limited understanding of asio timers and the lock usage here, I didn't spot any issues. I'm afraid that's the best I can say without taking time to write additional asynchronous tests.
public: | ||
ValidatorSite ( | ||
boost::asio::io_service& ios, | ||
ValidatorList& validators, | ||
beast::Journal j); | ||
beast::Journal j, | ||
std::chrono::seconds timeout = std::chrono::seconds{20}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation. When I first saw this my inclination was to move the Journal
to the end of the arguments, since that matches our usual parameter order. I see you made an exception to that expectation so you could default the timeout.
After playing around with it a bit I'm comfortable with your choice. There's (currently) only one place where the timeout is explicitly passed in by the caller, and that's in a unit test.
Not asking for a change. I just thought you'd want to know that I played around with the options and agree that you made the right choice.
auto timeoutCancel = | ||
[this] () | ||
{ | ||
std::unique_lock <std::mutex> lock_state{state_mutex_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this lock looks safe assuming that the previously existing code was safe. Both onSiteFetch()
and onTextFetch()
acquire state_mutex_
while they are not holding sites_mutex_
.
I'll add a nit, however. This file is using a mix of std::unique_lock
and std::lock_guard
. In this situation I'd personally be inclined to use the smallest necessary hammer in each of the situations. So I'd be inclined to change this instance to a std::lock_guard
(the smaller hammer).
If you agree with my assessment that using the smaller hammer is the best choice, then I found a total of six unique_lock
s in this file that could be downsized to lock_guard
(including this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed - thanks for pointing that out.
60u*24); | ||
sites_[siteIdx].refreshInterval = std::chrono::minutes{refresh}; | ||
sites_[siteIdx].nextRefresh = | ||
clock_type::now() + sites_[siteIdx].refreshInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following code is a bit more consistent with time units and possibly easier to read:
using namespace std::chrono_literals;
std::chrono::minutes const refresh =
boost::algorithm::clamp(
std::chrono::minutes {body["refresh_interval"].asUInt ()},
1min,
1h * 24);
sites_[siteIdx].refreshInterval = refresh;
sites_[siteIdx].nextRefresh =
clock_type::now() + sites_[siteIdx].refreshInterval;
Your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion... 1h * 24
-> 24h
// start a timer for the request, which shouldn't take more | ||
// than requestTimeout_ to complete | ||
std::unique_lock <std::mutex> lock_state{state_mutex_}; | ||
timer_.expires_at ( clock_type::now() + requestTimeout_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an opinion about this, but I'm curious why you called expires_at
instead of expires_after
. Calling expires_after
would look like this:
timer_.expires_after (requestTimeout_);
Also, is the extra space in front of clock_type::now()
intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, expires_after
is better here - thanks.
std::unique_lock <std::mutex> lock_state{state_mutex_}; | ||
timer_.expires_at ( clock_type::now() + requestTimeout_); | ||
timer_.async_wait ( std::bind ( | ||
&ValidatorSite::onRequestTimeout, this, siteIdx, std::placeholders::_1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer lambdas over std::bind
; lambdas are a more common idiom (and hence easier to read) these days. So consider rewriting this call like this:
timer_.async_wait ([this, siteIdx] (boost::system::error_code const& ec)
{
this->onRequestTimeout (siteIdx, err);
});
If you make this change you could consider a similar update to the one other use of std::bind
in this file.
boost::algorithm::join (paths | | ||
boost::adaptors::transformed( | ||
[](M::value_type t){ return t.path_; }), | ||
", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're only using M
once, and the function being invoked is not a template. So I'd be inclined to simplify things and remove a bit of the abstraction. Consider something like this:
testcase << "Fetch list - " <<
boost::algorithm::join (paths |
boost::adaptors::transformed(
[](FetchListConfig const& cfg){ return cfg.path_; }),
", ");
src/test/app/ValidatorSite_test.cpp
Outdated
@@ -208,16 +222,14 @@ class ValidatorSite_test : public beast::unit_test::suite | |||
std::vector<Validator> list; | |||
std::string uri; | |||
std::string expectMsg; | |||
bool shouldFail; | |||
bool failFetch; | |||
bool failApply; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice to draw the distinction on why it fails.
src/test/app/ValidatorSite_test.cpp
Outdated
bool isRetry; | ||
}; | ||
std::vector<publisher> servers; | ||
|
||
auto const sequence = 1; | ||
auto const version = 1; | ||
using namespace std::chrono_literals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to move this using
down closer to the only line that needs it, around line 275.
list_ += ",\"signature\":\"" + strHex(sig) + "\""; | ||
list_ += ",\"manifest\":\"" + manifest + "\""; | ||
list_ += ",\"version\":" + std::to_string(version) + '}'; | ||
getList_ = [blob, sig, manifest, version](int interval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice solution to passing in the interval.
} | ||
else if (boost::starts_with(path, "/sleep/")) | ||
{ | ||
auto sleep_sec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const
?
std::unique_lock <std::mutex> lock_state{state_mutex_}; | ||
error_code ec; | ||
timer_.cancel_one(ec); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overload of cancel_one
is deprecated. You can use the non-error_code overload. eg. timer_.cancel_one();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/test/app/ValidatorSite_test.cpp
Outdated
bool failFetch_ = false; | ||
bool failApply_ = false; | ||
int serverVersion_ = 1; | ||
std::chrono::seconds expiresFromNow_ = std::chrono::seconds{3600}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Our format typically does not use a trailing underscore denoting private on public members.
else if (boost::starts_with(path, "/sleep/")) | ||
{ | ||
auto sleep_sec = | ||
boost::lexical_cast<unsigned int>(path.substr(7)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can throw.
int refresh = 5; | ||
if (boost::starts_with(path, "/validators/refresh")) | ||
refresh = | ||
boost::lexical_cast<unsigned int>(path.substr(20)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Fixes: RIPD-1737
Enforce a 20s timeout for making validator list requests.