-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polkadot proces panicked at 'can not convert float seconds to Duration: value is negative' #817
Comments
Can you please show the entire stack trace? Or more logs around the error? |
I am sorry, i don't have more info for now. I had a lot of problems on that particular node so i've moved to other hardware, since then I didn't see the problem so it could be hardware related. Will report back if it happens again. |
I've got the same error running Docker images v0.9.26:
|
@polkalegos does this happen multiple times? |
So far just one @bkchr |
Very interesting... the std documents this error as only happening in
|
@polkalegos Just for reference, can you tell me which exact Docker image you were running? |
@koute latest tag version 0.9.26 |
@koute happening again on
|
@polkalegos Thank you for the backtrace! I'll take a look at this. |
I've ran into similar issue once. The problem was that |
This is a good point, however AFAIK since rust 1.60 it's supposed to saturate to zero and not panic anymore. The Anyway, looking at the code, here's where the error's defined: pub struct FromFloatSecsError {
kind: FromFloatSecsErrorKind,
}
impl FromFloatSecsError {
const fn description(&self) -> &'static str {
match self.kind {
FromFloatSecsErrorKind::Negative => {
"can not convert float seconds to Duration: value is negative"
}
FromFloatSecsErrorKind::OverflowOrNan => {
"can not convert float seconds to Duration: value is either too big or NaN"
}
}
}
} So we're getting hit by The only place it's used is in this macro: macro_rules! try_from_secs {
(
secs = $secs: expr,
mantissa_bits = $mant_bits: literal,
exponent_bits = $exp_bits: literal,
offset = $offset: literal,
bits_ty = $bits_ty:ty,
double_ty = $double_ty:ty,
) => {{
// ...
if $secs.is_sign_negative() {
return Err(FromFloatSecsError { kind: FromFloatSecsErrorKind::Negative });
} So the impl Duration {
// ...
pub const fn try_from_secs_f32(secs: f32) -> Result<Duration, FromFloatSecsError> {
try_from_secs!(
secs = secs,
mantissa_bits = 23,
exponent_bits = 8,
offset = 41,
bits_ty = u32,
double_ty = u64,
)
}
// ...
pub const fn try_from_secs_f64(secs: f64) -> Result<Duration, FromFloatSecsError> {
try_from_secs!(
secs = secs,
mantissa_bits = 52,
exponent_bits = 11,
offset = 44,
bits_ty = u64,
double_ty = u128,
)
}
} impl Duration {
// ...
pub const fn from_secs_f64(secs: f64) -> Duration {
match Duration::try_from_secs_f64(secs) {
Ok(v) => v,
Err(e) => panic!("{}", e.description()),
}
}
// ...
pub const fn from_secs_f32(secs: f32) -> Duration {
match Duration::try_from_secs_f32(secs) {
Ok(v) => v,
Err(e) => panic!("{}", e.description()),
}
}
// ...
} impl Duration {
// ...
pub const fn mul_f64(self, rhs: f64) -> Duration {
Duration::from_secs_f64(rhs * self.as_secs_f64())
}
// ...
pub const fn mul_f32(self, rhs: f32) -> Duration {
Duration::from_secs_f32(rhs * self.as_secs_f32())
}
// ...
pub const fn div_f64(self, rhs: f64) -> Duration {
Duration::from_secs_f64(self.as_secs_f64() / rhs)
}
// ...
pub const fn div_f32(self, rhs: f32) -> Duration {
Duration::from_secs_f32(self.as_secs_f32() / rhs)
}
} So basically anywhere we either create a Now let's look at the Box::new(move |who, intent, topic, mut data| {
if let MessageIntent::PeriodicRebroadcast = intent {
return do_rebroadcast
}
let peer = match inner.peers.peer(who) {
None => return false,
Some(x) => x,
};
// if the topic is not something we're keeping at the moment,
// do not send.
let (maybe_round, set_id) = match inner.live_topics.topic_info(topic) {
None => return false,
Some(x) => x,
};
if let MessageIntent::Broadcast = intent {
if maybe_round.is_some() {
if !inner.round_message_allowed(who) {
// early return if the vote message isn't allowed at this stage.
return false
}
} else if !inner.global_message_allowed(who) {
// early return if the global message isn't allowed at this stage.
return false
}
}
// if the topic is not something the peer accepts, discard.
if let Some(round) = maybe_round {
return peer.view.consider_vote(round, set_id) == Consider::Accept
}
// global message.
let local_view = match inner.local_view {
Some(ref v) => v,
None => return false, // cannot evaluate until we have a local view.
};
match GossipMessage::<Block>::decode(&mut data) {
Err(_) => false,
Ok(GossipMessage::Commit(full)) => {
// we only broadcast commit messages if they're for the same
// set the peer is in and if the commit is better than the
// last received by peer, additionally we make sure to only
// broadcast our best commit.
peer.view.consider_global(set_id, full.message.target_number) ==
Consider::Accept && Some(&full.message.target_number) ==
local_view.last_commit_height()
},
Ok(GossipMessage::Neighbor(_)) => false,
Ok(GossipMessage::CatchUpRequest(_)) => false,
Ok(GossipMessage::CatchUp(_)) => false,
Ok(GossipMessage::Vote(_)) => false, // should not be the case.
}
}) Let's look at the fn round_message_allowed(&self, who: &PeerId) -> bool {
let round_duration = self.config.gossip_duration * ROUND_DURATION;
let round_elapsed = match self.local_view {
Some(ref local_view) => local_view.round_start.elapsed(),
None => return false,
};
if round_elapsed < round_duration.mul_f32(PROPAGATION_SOME) {
self.peers.first_stage_peers.contains(who)
} else if round_elapsed < round_duration.mul_f32(PROPAGATION_ALL) {
self.peers.first_stage_peers.contains(who) ||
self.peers.second_stage_peers.contains(who)
} else {
self.peers.peer(who).map(|info| !info.roles.is_light()).unwrap_or(false)
}
}
// ...
fn global_message_allowed(&self, who: &PeerId) -> bool {
let round_duration = self.config.gossip_duration * ROUND_DURATION;
let round_elapsed = match self.local_view {
Some(ref local_view) => local_view.round_start.elapsed(),
None => return false,
};
if round_elapsed < round_duration.mul_f32(PROPAGATION_ALL) {
self.peers.first_stage_peers.contains(who) ||
self.peers.second_stage_peers.contains(who) ||
self.peers.lucky_light_peers.contains(who)
} else {
true
}
} Both use But now here's the weird part; let me list out all of the relevant constants and variables:
Everything here is positive. So this is really weird; unless I'm missing something here this panic should be impossible to trigger, unless the @polkalegos Can you give us more details as to the system on which you've seen this problem?
|
The original report was on a hetzner VPS, which I think is very underperforming so it could be a related to hardware/memory problems. Since I moved to dedicated hardware I haven't seen the problem repeat. |
(from Rust Doc)
Not sure what role Docker and VPS play here. If it runs in some Qemu or whatever maybe it breaks the monotic clock somehow. From looking at the code I came to the same conclusion that only |
* Refactor types * more refactoring * refactor * refactor * refactor * refactor * refactor * update cumulus * Update benchmarks * Update template for generating benchmarking data * Fix bug in tests to prevent stack overflow * Fix for benchmark & More refactoring (paritytech#817) * Fix for benchmark & More refactoring * More refactoring * Add const_assert back --------- Co-authored-by: Ron <[email protected]>
Signed-off-by: koushiro <[email protected]> Signed-off-by: koushiro <[email protected]>
While being in the active validator set the process crashed, it did restart and since is running and validating normal again.
Polkadot LOG
May 29 13:48:14 00.stakeworld.nl polkadot[6173]: Thread 'tokio-runtime-worker' panicked at 'can not convert float seconds to Duration: value is negative', /rustc/7737e0b5c4103216d6fd8cf94>
May 29 13:48:14 00.stakeworld.nl polkadot[6173]: This is a bug. Please report it at:
May 29 13:48:14 00.stakeworld.nl polkadot[6173]: https://github.com/paritytech/polkadot/issues/new
Dmesg output
[May29 13:48] tokio-runtime-w[6345]: segfault at 0 ip 0000000000000000 sp 00007f13ac76eba8 error 14 in polkadot[5621a0f89000+247000]
[ +0.000081] Code: Bad RIP value.
The text was updated successfully, but these errors were encountered: