-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.0] P2P: Use magnitude safe time types #1316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what was the issue with tstamp. I felt that standardizing on the number of nanoseconds since epoch was nice and simple, but I'm clearly missing what the issue was here.
void connection::check_heartbeat( tstamp current_time ) { | ||
if( latest_msg_time > 0 ) { | ||
void connection::check_heartbeat( std::chrono::system_clock::time_point current_time ) { | ||
if( latest_msg_time > std::chrono::system_clock::time_point::min() ) { | ||
if( current_time > latest_msg_time + hb_timeout ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was here. current_time
was in nanoseconds. lastest_msg_time
was also in nanoseconds, but hb_timeout
was in milliseconds
. So when the timer ran the latest_msg_time
had to be within 10000 nanoseconds (10 microseconds) or it would close the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue was that hb_timeout
was in milliseconds. It seems incorrect to me, as it is defined as std::chrono::system_clock::duration::rep
, so I feel we should consistently use system_clock
duration in tstamp (on my linux system, system_clock
's duration
is defined as chrono::nanoseconds
.
I assume that by using std::chrono
types, we will benefit from an automatic conversion in if( current_time > latest_msg_time + hb_timeout )
which will prevent the error. But tstamp
was already a std::chrono duration, so I wonder if most of the changes, including this one, are really needed.
My feeling is that the change at line 610 (hb_timeout = msec;
) would have been enough for the fix, as the problem was that we used count()
which just returned a tick count, so we didn't benefit from the conversion from milliseconds to nanoseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be. Either way, this is much cleaner/safer code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this makes the code cleaner or safer. I think if you changed line 610 to hb_timeout = msec.count()
the issue would come back exactly as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused at what you are suggesting should be done. time_point
, milliseconds
, nanoseconds
types are well defined and arithmetic operations are handled correctly. tstamp
(std::chrono::system_clock::duration::rep
) on the other hand is defined as rep
- signed arithmetic type representing the number of ticks in the clock's duration
which is system dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is nanoseconds too sensitive in P2P?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nanoseconds is kind of ridiculous to use. However, that is what we were already using for time_message
and time
of handshake_message
. Internally now with this change we just use whatever std::chrono::system_clock::now()
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tstamp
should be either defined as:
std::chrono::nanoseconds
(in which case we would benefit from safe type checking and conversions),- or as
int64_t
(in which case it would be clear it is simply a number and up to the programmer to ensure we always store nanoseconds into it).
As far as I can see, the bug was that we are in case #2 (although it is somewhat obfuscated and unreliable because we use a std::chrono internal type) and that we stored milliseconds in a tstamp
(line #611).
Your change fixes the bug at line 611, and in addition replaces the use of tstamp
with std::chrono::system_clock::time_point
. I think it makes the code less readable.
My preference would be to implement #1 above, i.e. redefine tstamp
as std::chrono::nanoseconds
, which I think would provide the additional safety in a cleaner way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely removed tstamp
type and backported the better time_message
handling from main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kevin for the change. I'm not quite sure is the initialization with std::chrono::system_clock::time_point::min()
has a specific benefit, I kinda feel that 0
is more appropriate than a negative value. And I also thing that defining tstamp
as std::chrono::nanoseconds
and using it for our connection
time tracking members would be better. But maybe we can talk about this further for 5.0.
…'t trip heartbeat timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some tests for the original problem and/or those changes?
Use
std::chrono
types instead oftstamp
so that magnitude is maintained. Fixes issue with heartbeat calculation being off on some machines causing pre-mature close of what it thought was a idle connection.Resolves #1315