Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request timeout for ValidatorSites: #2902

Closed
wants to merge 3 commits into from
Closed

Add request timeout for ValidatorSites: #2902

wants to merge 3 commits into from

Conversation

mellery451
Copy link
Contributor

Fixes: RIPD-1737

Enforce a 20s timeout for making validator list requests.

Fixes: RIPD-1737

Enforce a 20s timeout for making validator list requests.
@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Apr 10, 2019

Jenkins Build Summary

Built from this commit

Built at 20190415 - 20:59:27

Test Results

Build Type Log Result Status
rpm logfile 1166 cases, 0 failed, t: n/a PASS ✅
msvc.Debug logfile 1166 cases, 0 failed, t: 594s PASS ✅
gcc.Release
-Dassert=ON,
MANUAL_TESTS=true
logfile 915 cases, 0 failed, t: 5m9s PASS ✅
docs,
TARGET=docs
logfile 1 cases, 0 failed, t: 0m1s PASS ✅
msvc.Debug,
NINJA_BUILD=true
logfile 1166 cases, 0 failed, t: 633s PASS ✅
clang.Debug logfile 1166 cases, 0 failed, t: 2m56s PASS ✅
gcc.Debug
-Dcoverage=ON,
TARGET=coverage_report,
SKIP_TESTS=true
logfile 1166 cases, 0 failed, t: 17m8s PASS ✅
clang.Debug
-Dunity=OFF
logfile 1165 cases, 0 failed, t: 2m5s PASS ✅
gcc.Debug
-Dunity=OFF
logfile 1165 cases, 0 failed, t: 1m25s PASS ✅
gcc.Debug logfile 1166 cases, 0 failed, t: 3m6s PASS ✅
clang.Release
-Dassert=ON
logfile 1166 cases, 0 failed, t: 4m56s PASS ✅
gcc.Release
-Dassert=ON
logfile 1166 cases, 0 failed, t: 5m12s PASS ✅
msvc.Debug
-Dunity=OFF
logfile 1165 cases, 0 failed, t: 1124s PASS ✅
gcc.Debug
-Dstatic=OFF
logfile 1166 cases, 0 failed, t: 3m4s PASS ✅
gcc.Debug
-Dstatic=OFF -DBUILD_SHARED_LIBS=ON
logfile 1166 cases, 0 failed, t: 3m6s PASS ✅
msvc.Release logfile 1166 cases, 0 failed, t: 419s PASS ✅
gcc.Debug,
NINJA_BUILD=true
logfile 1166 cases, 0 failed, t: 2m57s PASS ✅
clang.Debug
-Dunity=OFF -Dsan=address,
PARALLEL_TESTS=false,
DEBUGGER=false
logfile 1165 cases, 0 failed, t: 11m18s PASS ✅
clang.Debug
-Dunity=OFF -Dsan=undefined,
PARALLEL_TESTS=false
logfile 1165 cases, 0 failed, t: 9m32s PASS ✅

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice change! I left a few nits that you can address, or not, as you see fit.

That said, I wish I could say that your change is blindingly, obviously, safe. I, personally, see no problems. However I find that working with timers in asio can be extremely subtle. I looked hard. And, given my limited understanding of asio timers and the lock usage here, I didn't spot any issues. I'm afraid that's the best I can say without taking time to write additional asynchronous tests.

public:
ValidatorSite (
boost::asio::io_service& ios,
ValidatorList& validators,
beast::Journal j);
beast::Journal j,
std::chrono::seconds timeout = std::chrono::seconds{20});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation. When I first saw this my inclination was to move the Journal to the end of the arguments, since that matches our usual parameter order. I see you made an exception to that expectation so you could default the timeout.

After playing around with it a bit I'm comfortable with your choice. There's (currently) only one place where the timeout is explicitly passed in by the caller, and that's in a unit test.

Not asking for a change. I just thought you'd want to know that I played around with the options and agree that you made the right choice.

auto timeoutCancel =
[this] ()
{
std::unique_lock <std::mutex> lock_state{state_mutex_};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking this lock looks safe assuming that the previously existing code was safe. Both onSiteFetch() and onTextFetch() acquire state_mutex_ while they are not holding sites_mutex_.

I'll add a nit, however. This file is using a mix of std::unique_lock and std::lock_guard. In this situation I'd personally be inclined to use the smallest necessary hammer in each of the situations. So I'd be inclined to change this instance to a std::lock_guard (the smaller hammer).

If you agree with my assessment that using the smaller hammer is the best choice, then I found a total of six unique_locks in this file that could be downsized to lock_guard (including this one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed - thanks for pointing that out.

60u*24);
sites_[siteIdx].refreshInterval = std::chrono::minutes{refresh};
sites_[siteIdx].nextRefresh =
clock_type::now() + sites_[siteIdx].refreshInterval;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following code is a bit more consistent with time units and possibly easier to read:

        using namespace std::chrono_literals;
        std::chrono::minutes const refresh =
            boost::algorithm::clamp(
                std::chrono::minutes {body["refresh_interval"].asUInt ()},
                1min,
                1h * 24);
        sites_[siteIdx].refreshInterval = refresh;
        sites_[siteIdx].nextRefresh =
            clock_type::now() + sites_[siteIdx].refreshInterval;

Your choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - fixed

Copy link
Contributor

@miguelportilla miguelportilla Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion... 1h * 24 -> 24h

// start a timer for the request, which shouldn't take more
// than requestTimeout_ to complete
std::unique_lock <std::mutex> lock_state{state_mutex_};
timer_.expires_at ( clock_type::now() + requestTimeout_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion about this, but I'm curious why you called expires_at instead of expires_after. Calling expires_after would look like this:

    timer_.expires_after (requestTimeout_);

Also, is the extra space in front of clock_type::now() intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, expires_after is better here - thanks.

std::unique_lock <std::mutex> lock_state{state_mutex_};
timer_.expires_at ( clock_type::now() + requestTimeout_);
timer_.async_wait ( std::bind (
&ValidatorSite::onRequestTimeout, this, siteIdx, std::placeholders::_1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer lambdas over std::bind; lambdas are a more common idiom (and hence easier to read) these days. So consider rewriting this call like this:

    timer_.async_wait ([this, siteIdx] (boost::system::error_code const& ec)
        {
            this->onRequestTimeout (siteIdx, err);
        });

If you make this change you could consider a similar update to the one other use of std::bind in this file.

boost::algorithm::join (paths |
boost::adaptors::transformed(
[](M::value_type t){ return t.path_; }),
", ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're only using M once, and the function being invoked is not a template. So I'd be inclined to simplify things and remove a bit of the abstraction. Consider something like this:

        testcase << "Fetch list - " <<
            boost::algorithm::join (paths |
                boost::adaptors::transformed(
                    [](FetchListConfig const& cfg){ return cfg.path_; }),
                ", ");

@@ -208,16 +222,14 @@ class ValidatorSite_test : public beast::unit_test::suite
std::vector<Validator> list;
std::string uri;
std::string expectMsg;
bool shouldFail;
bool failFetch;
bool failApply;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice to draw the distinction on why it fails.

bool isRetry;
};
std::vector<publisher> servers;

auto const sequence = 1;
auto const version = 1;
using namespace std::chrono_literals;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to move this using down closer to the only line that needs it, around line 275.

list_ += ",\"signature\":\"" + strHex(sig) + "\"";
list_ += ",\"manifest\":\"" + manifest + "\"";
list_ += ",\"version\":" + std::to_string(version) + '}';
getList_ = [blob, sig, manifest, version](int interval) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice solution to passing in the interval.

}
else if (boost::starts_with(path, "/sleep/"))
{
auto sleep_sec =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const?

std::unique_lock <std::mutex> lock_state{state_mutex_};
error_code ec;
timer_.cancel_one(ec);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload of cancel_one is deprecated. You can use the non-error_code overload. eg. timer_.cancel_one();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

bool failFetch_ = false;
bool failApply_ = false;
int serverVersion_ = 1;
std::chrono::seconds expiresFromNow_ = std::chrono::seconds{3600};
Copy link
Contributor

@miguelportilla miguelportilla Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Our format typically does not use a trailing underscore denoting private on public members.

else if (boost::starts_with(path, "/sleep/"))
{
auto sleep_sec =
boost::lexical_cast<unsigned int>(path.substr(7));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can throw.

int refresh = 5;
if (boost::starts_with(path, "/validators/refresh"))
refresh =
boost::lexical_cast<unsigned int>(path.substr(20));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can throw.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM.

Copy link
Contributor

@miguelportilla miguelportilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nbougalis nbougalis mentioned this pull request Apr 27, 2019
@seelabs seelabs mentioned this pull request Apr 29, 2019
@mellery451 mellery451 deleted the vl-timer branch April 29, 2019 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants