Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak logs and attempt to avoid races around removing nodes #504

Merged
merged 2 commits into from
Oct 10, 2022

Conversation

jsdw
Copy link
Collaborator

@jsdw jsdw commented Oct 5, 2022

  • Updated some of the log messages to the nicer inline fmt style, and tweaked the odd message.
  • Don't queue up "remove node" messages from shard to core if disconnected; they won't make sense after a reconnect.
  • Try to avoid reusing IDs; this might cause races around disconnects if messages are queued or received with "old" IDs and new nodes are reusing said IDs.

// It's possible that some race between removing and disconnecting shards might lead to
// more than one remove message for the same node. This isn't really a problem, but we
// hope it won't happen so make a note if it does:
log::debug!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we worried about the process consuming too much memory because we may never call remove_nodes_and_broadcast_result here? Leading to a potentially OOM kill?

Would it be possible for a malicious party to force the shard to submit Remove { random_id }?
And in that could we "mute" / disconnect the shard if it makes N wrong Remove calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be possible for any party to have any control over any of the ids; we assign them all internally.

And if the id doesn't exist for this remove message, it means the node has been removed already :)

Remove calls are also entirely internal; an external person cannot cause it to be called except by disconnecting (or when messages with certain ids go stale).

// Leave the `current_id` as-is. Why? To avoid reusing IDs and risking
// race conditions where old messages can accidentally screw with new nodes
// that have been assigned the same ID.
self.mapping = BiMap::new();
Copy link
Member

@niklasad1 niklasad1 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't understand.

Before we were resetting this.id == 0 in BiMap::new() and multiple nodes to entities could do this an end-up with id == 0 after clear and now just ignore this id and the next one will be this.id +1?

It looks like the id which will wrap in release mode if this.id + 1 > MAX

Can you add an explicit panic or something so we are sure that it doesn't wrap around and ends up to replace an "old ID" or what will happen then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I did this because I worried that "old" IDs might be reused in some sorts of obscure race conditions, leading to new nodes being affected by "old" messages or something like that.

Indeed, eventually it'll overflow and panic, but since the ID is a 64 bit value it'll never realisically hit usize::MAX so I wasn't worried about overflowing (I almost considered adding an explicit wrapping_add but meh)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was so easy to do I did make the thing wrapping_add :)

@jsdw jsdw merged commit e7d15d0 into master Oct 10, 2022
@jsdw jsdw deleted the jsdw-logs-and-removing branch October 10, 2022 12:12
Rose2161 pushed a commit to Cryptob3auty/mtsubstrate-telemen that referenced this pull request Apr 15, 2024
…ch#504)

* Tweak logs and attempt to avoid races around removing nodes

* wrapping_add in assign_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants