Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request timeout for ValidatorSites: #2902

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions src/ripple/app/misc/ValidatorSite.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ namespace ripple {

@li @c "version": 1

@li @c "refreshInterval" (optional)
@li @c "refreshInterval" (optional, integer minutes).
This value is clamped internally to [1,1440] (1 min - 1 day)
*/
class ValidatorSite
{
Expand Down Expand Up @@ -125,11 +126,15 @@ class ValidatorSite
// The configured list of URIs for fetching lists
std::vector<Site> sites_;

// time to allow for requests to complete
const std::chrono::seconds requestTimeout_;

public:
ValidatorSite (
boost::asio::io_service& ios,
ValidatorList& validators,
beast::Journal j);
beast::Journal j,
std::chrono::seconds timeout = std::chrono::seconds{20});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation. When I first saw this my inclination was to move the Journal to the end of the arguments, since that matches our usual parameter order. I see you made an exception to that expectation so you could default the timeout.

After playing around with it a bit I'm comfortable with your choice. There's (currently) only one place where the timeout is explicitly passed in by the caller, and that's in a unit test.

Not asking for a change. I just thought you'd want to know that I played around with the options and agree that you made the right choice.

~ValidatorSite ();

/** Load configured site URIs.
Expand Down Expand Up @@ -187,6 +192,12 @@ class ValidatorSite
void
setTimer ();

/// request took too long
void
onRequestTimeout (
std::size_t siteIdx,
error_code const& ec);

/// Fetch site whose time has come
void
onTimer (
Expand Down
85 changes: 67 additions & 18 deletions src/ripple/app/misc/impl/ValidatorSite.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,15 @@
#include <ripple/basics/Slice.h>
#include <ripple/json/json_reader.h>
#include <ripple/protocol/JsonFields.h>
#include <boost/algorithm/clamp.hpp>
#include <boost/regex.hpp>
#include <algorithm>

namespace ripple {

// default site query frequency - 5 minutes
auto constexpr DEFAULT_REFRESH_INTERVAL = std::chrono::minutes{5};
auto constexpr ERROR_RETRY_INTERVAL = std::chrono::seconds{30};
unsigned short constexpr MAX_REDIRECTS = 3;
auto constexpr DEFAULT_REFRESH_INTERVAL = std::chrono::minutes{5};
auto constexpr ERROR_RETRY_INTERVAL = std::chrono::seconds{30};
unsigned short constexpr MAX_REDIRECTS = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll express a preference. My preference is to reserve ALL_CAPS for macros. When I see ALL_CAPS in source code I start to get suspicious, and in this case my suspicions would not be justified. There are certainly other places in the code base that use ALL_CAPS to indicate constants, but I've slowly been converting them over time. At any rate, I think it would be good to convert these from ALL_CAPS, but your call whether you want to.


ValidatorSite::Site::Resource::Resource (std::string uri_)
: uri {std::move(uri_)}
Expand Down Expand Up @@ -90,14 +90,16 @@ ValidatorSite::Site::Site (std::string uri)
ValidatorSite::ValidatorSite (
boost::asio::io_service& ios,
ValidatorList& validators,
beast::Journal j)
beast::Journal j,
std::chrono::seconds timeout)
: ios_ (ios)
, validators_ (validators)
, j_ (j)
, timer_ (ios_)
, fetching_ (false)
, pending_ (false)
, stopping_ (false)
, requestTimeout_ (timeout)
{
}

Expand Down Expand Up @@ -168,10 +170,12 @@ ValidatorSite::stop()
{
std::unique_lock<std::mutex> lock{state_mutex_};
stopping_ = true;
cv_.wait(lock, [&]{ return ! fetching_; });

// work::cancel() must be called before the
// cv wait in order to kick any asio async operations
// that might be pending.
if(auto sp = work_.lock())
sp->cancel();
cv_.wait(lock, [&]{ return ! fetching_; });

error_code ec;
timer_.cancel(ec);
Expand Down Expand Up @@ -210,17 +214,30 @@ ValidatorSite::makeRequest (
fetching_ = true;
sites_[siteIdx].activeResource = resource;
std::shared_ptr<detail::Work> sp;
auto timeoutCancel =
[this] ()
{
std::unique_lock <std::mutex> lock_state{state_mutex_};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking this lock looks safe assuming that the previously existing code was safe. Both onSiteFetch() and onTextFetch() acquire state_mutex_ while they are not holding sites_mutex_.

I'll add a nit, however. This file is using a mix of std::unique_lock and std::lock_guard. In this situation I'd personally be inclined to use the smallest necessary hammer in each of the situations. So I'd be inclined to change this instance to a std::lock_guard (the smaller hammer).

If you agree with my assessment that using the smaller hammer is the best choice, then I found a total of six unique_locks in this file that could be downsized to lock_guard (including this one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed - thanks for pointing that out.

error_code ec;
timer_.cancel_one(ec);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload of cancel_one is deprecated. You can use the non-error_code overload. eg. timer_.cancel_one();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

auto onFetch =
[this, siteIdx] (error_code const& err, detail::response_type&& resp)
[this, siteIdx, timeoutCancel] (
error_code const& err, detail::response_type&& resp)
{
timeoutCancel ();
onSiteFetch (err, std::move(resp), siteIdx);
};

auto onFetchFile =
[this, siteIdx] (error_code const& err, std::string const& resp)
{
onTextFetch (err, resp, siteIdx);
};
[this, siteIdx, timeoutCancel] (
error_code const& err, std::string const& resp)
{
timeoutCancel ();
onTextFetch (err, resp, siteIdx);
};

JLOG (j_.debug()) << "Starting request for " << resource->uri;

if (resource->pUrl.scheme == "https")
{
Expand Down Expand Up @@ -252,6 +269,32 @@ ValidatorSite::makeRequest (

work_ = sp;
sp->run ();
// start a timer for the request, which shouldn't take more
// than requestTimeout_ to complete
std::unique_lock <std::mutex> lock_state{state_mutex_};
timer_.expires_at ( clock_type::now() + requestTimeout_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion about this, but I'm curious why you called expires_at instead of expires_after. Calling expires_after would look like this:

    timer_.expires_after (requestTimeout_);

Also, is the extra space in front of clock_type::now() intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, expires_after is better here - thanks.

timer_.async_wait ( std::bind (
&ValidatorSite::onRequestTimeout, this, siteIdx, std::placeholders::_1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer lambdas over std::bind; lambdas are a more common idiom (and hence easier to read) these days. So consider rewriting this call like this:

    timer_.async_wait ([this, siteIdx] (boost::system::error_code const& ec)
        {
            this->onRequestTimeout (siteIdx, err);
        });

If you make this change you could consider a similar update to the one other use of std::bind in this file.

}

void
ValidatorSite::onRequestTimeout (
std::size_t siteIdx,
error_code const& ec)
{
if (ec)
return;

{
std::unique_lock <std::mutex> lock_site{sites_mutex_};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the unique_locks here could be lock_guards.

Copy link
Contributor

@miguelportilla miguelportilla Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are in the neighborhood... it seems the caught exception at line 327, onSiteFetch is called while already locked on sites_mutex. I believe that is UB.

It would also be nice to add a documenting param to setTimer to make it clear that it is always called with a lock on state_mutex_

Copy link
Contributor Author

@mellery451 mellery451 Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - good point. I'll move the lock inside the try and that should fix it. I'll also remove that assert since we don't have a state lock here and I'll add the lock param to setTimer too.

JLOG (j_.warn()) <<
"Request for " << sites_[siteIdx].activeResource->uri <<
" took too long";
}

std::unique_lock<std::mutex> lock_state{state_mutex_};
if(auto sp = work_.lock())
sp->cancel();
}

void
Expand Down Expand Up @@ -370,10 +413,14 @@ ValidatorSite::parseJsonResponse (
if (body.isMember ("refresh_interval") &&
body["refresh_interval"].isNumeric ())
{
// TODO: should we sanity check/clamp this value
// to something reasonable?
sites_[siteIdx].refreshInterval =
std::chrono::minutes{body["refresh_interval"].asUInt ()};
auto refresh =
boost::algorithm::clamp(
body["refresh_interval"].asUInt (),
1u,
60u*24);
sites_[siteIdx].refreshInterval = std::chrono::minutes{refresh};
sites_[siteIdx].nextRefresh =
clock_type::now() + sites_[siteIdx].refreshInterval;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following code is a bit more consistent with time units and possibly easier to read:

        using namespace std::chrono_literals;
        std::chrono::minutes const refresh =
            boost::algorithm::clamp(
                std::chrono::minutes {body["refresh_interval"].asUInt ()},
                1min,
                1h * 24);
        sites_[siteIdx].refreshInterval = refresh;
        sites_[siteIdx].nextRefresh =
            clock_type::now() + sites_[siteIdx].refreshInterval;

Your choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - fixed

Copy link
Contributor

@miguelportilla miguelportilla Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion... 1h * 24 -> 24h

}
}

Expand Down Expand Up @@ -435,6 +482,8 @@ ValidatorSite::onSiteFetch(
{
{
std::lock_guard <std::mutex> lock_sites{sites_mutex_};
JLOG (j_.debug()) << "Got completion for "
<< sites_[siteIdx].activeResource->uri;
auto onError = [&](std::string const& errMsg, bool retry)
{
sites_[siteIdx].lastRefreshStatus.emplace(
Expand Down Expand Up @@ -503,7 +552,7 @@ ValidatorSite::onSiteFetch(
sites_[siteIdx].activeResource.reset();
}

std::lock_guard <std::mutex> lock_state{state_mutex_};
std::unique_lock <std::mutex> lock_state{state_mutex_};
fetching_ = false;
if (! stopping_)
setTimer ();
Expand Down Expand Up @@ -544,7 +593,7 @@ ValidatorSite::onTextFetch(
sites_[siteIdx].activeResource.reset();
}

std::lock_guard <std::mutex> lock_state{state_mutex_};
std::unique_lock <std::mutex> lock_state{state_mutex_};
fetching_ = false;
if (! stopping_)
setTimer ();
Expand Down
Loading