-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak logs and attempt to avoid races around removing nodes #504
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -248,7 +248,7 @@ impl InnerLoop { | |
} | ||
|
||
if let Err(e) = metered_tx.send(msg) { | ||
log::error!("Cannot send message into aggregator: {}", e); | ||
log::error!("Cannot send message into aggregator: {e}"); | ||
break; | ||
} | ||
} | ||
|
@@ -386,10 +386,11 @@ impl InnerLoop { | |
let node_id = match self.node_ids.remove_by_right(&(shard_conn_id, local_id)) { | ||
Some((node_id, _)) => node_id, | ||
None => { | ||
log::error!( | ||
"Cannot find ID for node with shard/connectionId of {:?}/{:?}", | ||
shard_conn_id, | ||
local_id | ||
// It's possible that some race between removing and disconnecting shards might lead to | ||
// more than one remove message for the same node. This isn't really a problem, but we | ||
// hope it won't happen so make a note if it does: | ||
log::debug!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we worried about the process consuming too much memory because we may never call Would it be possible for a malicious party to force the shard to submit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It shouldn't be possible for any party to have any control over any of the ids; we assign them all internally. And if the id doesn't exist for this remove message, it means the node has been removed already :) Remove calls are also entirely internal; an external person cannot cause it to be called except by disconnecting (or when messages with certain ids go stale). |
||
"Remove: Cannot find ID for node with shard/connectionId of {shard_conn_id:?}/{local_id:?}" | ||
); | ||
return; | ||
} | ||
|
@@ -401,9 +402,7 @@ impl InnerLoop { | |
Some(id) => *id, | ||
None => { | ||
log::error!( | ||
"Cannot find ID for node with shard/connectionId of {:?}/{:?}", | ||
shard_conn_id, | ||
local_id | ||
"Update: Cannot find ID for node with shard/connectionId of {shard_conn_id:?}/{local_id:?}" | ||
); | ||
return; | ||
} | ||
|
@@ -606,7 +605,7 @@ impl InnerLoop { | |
let removed_details = match self.node_state.remove_node(node_id) { | ||
Some(remove_details) => remove_details, | ||
None => { | ||
log::error!("Could not find node {:?}", node_id); | ||
log::error!("Could not find node {node_id:?}"); | ||
return; | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't understand.
Before we were resetting
this.id == 0
inBiMap::new()
and multiple nodes to entities could do this an end-up withid == 0
after clear and now just ignore this id and the next one will bethis.id +1
?It looks like the
id
which will wrap in release modeif this.id + 1 > MAX
Can you add an explicit panic or something so we are sure that it doesn't wrap around and ends up to replace an "old ID" or what will happen then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I did this because I worried that "old" IDs might be reused in some sorts of obscure race conditions, leading to new nodes being affected by "old" messages or something like that.
Indeed, eventually it'll overflow and panic, but since the ID is a 64 bit value it'll never realisically hit
usize::MAX
so I wasn't worried about overflowing (I almost considered adding an explicit wrapping_add but meh)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it was so easy to do I did make the thing wrapping_add :)