Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Limit ibft msg queues #704

Merged
merged 24 commits into from
Feb 6, 2019
Merged

Conversation

jframe
Copy link
Contributor

@jframe jframe commented Jan 30, 2019

PR description

Add a hard cap to the number unprocessed messages in the event queue and the seen messages tracked.

Fixed Issue(s)

@jframe jframe requested review from CjHare and rain-on January 30, 2019 02:51
CjHare
CjHare previously approved these changes Jan 30, 2019
Copy link
Contributor

@CjHare CjHare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -26,6 +26,7 @@
private static final int EXPECTED_DEFAULT_EPOCH_LENGTH = 30_000;
private static final int EXPECTED_DEFAULT_BLOCK_PERIOD = 1;
private static final int EXPECTED_DEFAULT_REQUEST_TIMEOUT = 1;
private static final int EXPECTED_DEFAULT_MESSAGE_BUFFER_SIZE = 10_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these oddly named? I.e. what does "EXPECTED_" indicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are bit oddly named I guess. The "EXPECTED_" indicates this expected values for the assertion. These values are used to assert the default values are correct in the tests. It's existing test code so I just followed the existing pattern.

@@ -158,7 +159,8 @@ public TestContext build() {
// Use a stubbed version of the multicaster, to prevent creating PeerConnections etc.
final StubValidatorMulticaster multicaster = new StubValidatorMulticaster();

final UniqueMessageMulticaster uniqueMulticaster = new UniqueMessageMulticaster(multicaster);
final UniqueMessageMulticaster uniqueMulticaster =
new UniqueMessageMulticaster(multicaster, MESSAGE_BUFFER_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should Multicaster be updated as part of this change? Or is that a separate problem?
Is the size of the multicaster buffer related to the incoming event queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multicaster should be updated to have a limited size buffer. Would be better to have a seperate config option though.

@@ -64,7 +64,7 @@
private final BlockTimer blockTimer;
private final IbftMessageTransmitter transmitter;
private final MessageFactory messageFactory;
private final Map<Integer, RoundState> futureRoundStateBuffer = Maps.newHashMap();
private final Map<Integer, RoundState> futureRoundStateBuffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to the incoming event queue size?

@@ -39,4 +41,12 @@ public int getBlockPeriodSeconds() {
public int getRequestTimeoutSeconds() {
return ibftConfigRoot.getInteger("requesttimeoutseconds", DEFAULT_ROUND_EXPIRY_SECONDS);
}

public int getSeenMessageQueueSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/getSeenMessageQueueSize/getSeenMessageQueueLimit

QueueSize could be mistaken as being the current queue size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like limit better. Still don't like these config names though regardless of size or limit suffix. So if have any suggestions...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeenMessages is less obvious, I'll have to think on it.

Copy link
Contributor

@CjHare CjHare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use size when you mean limit. Consider common use of size in places such as the Collection classes

@CjHare CjHare dismissed their stale review February 1, 2019 05:09

PR changed the approved code

@jframe
Copy link
Contributor Author

jframe commented Feb 1, 2019

Please don't use size when you mean limit. Consider common use of size in places such as the Collection classes

Intent was to be the max size

return ibftConfigRoot.getInteger("seenmessagequeuesize", DEFAULT_SEEN_MESSAGES_QUEUE_SIZE);
}

public int getEventQueueSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more appropriate name for this variable would be messageQueueLimit, as it is pretty much the standard use for a MQ. This should make sense / easy inference by enterprise users of the product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. changed eventQueueSize -> messageQueueLimit and seenMessageQueueSize -> gossipedHistoryLimit

@@ -21,6 +21,8 @@
private static final long DEFAULT_EPOCH_LENGTH = 30_000;
private static final int DEFAULT_BLOCK_PERIOD_SECONDS = 1;
private static final int DEFAULT_ROUND_EXPIRY_SECONDS = 1;
private static final int DEFAULT_GOSSIPED_HISTORY_LIMIT = 10_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this FEELS like a huge number ... in a good system, should this not be about ... 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10,000 is probably large than it needs to be. It will however only take a small amount of memory as we are only storing the hashes. So I don't think we need to be too exact and can be a little generous.
Perhaps 1000 is a more appropriate default? I think this would handle to tracking the message hashes of few rounds. And a size of 100 would only just cover the total number of messages for the current round if we going to use 20 validators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this may be disingenuous - in an ideal situation we only need to buffer enough to prevent re-gossiping gossiped packets.
Which probably means the last 'set' of packets i.e. once we've received a few Prepares, we SHOULDN'T receive any more gossiped proposals (and prior round msgs are discarded out of hand) ... 100 may be plenty....
Happy to go as high as 1000 to handle spurious behaviours from peers .... but I think we should really understand how this works in both normal and byzantine circumstances before we throw in a safe number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth thinking about if there's some metrics you could capture to help find the right value for this setting (even if you need to add some expensive ones temporarily). ie every time you find a message you've already gossiped, record how many messages are recorded after it in your history. That will likely have to be a temporary thing because you'll need to use a less efficient structure to hold the history (worst case a list you iterate from most recent to least recent), but it would give you some concrete data to drive decisions from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some further discussion and settled on 1000 for now as this will cover the current height with several rounds.
Would be great to get some metrics on this. And would also allow users to tune this setting if they wanted to.

private static final Logger LOG = LogManager.getLogger();
private final int maxQueueSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to using queueLimit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still there. I missed updating the existing field name in the IbftEventQueue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to make it consistent with the new config names

@jframe jframe merged commit e4291e6 into PegaSysEng:master Feb 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants