Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Separate download state tracking from WorldStateDownloader #967

Merged
merged 14 commits into from
Feb 26, 2019

Conversation

ajsutton
Copy link
Contributor

PR description

The logic around tracking the state for a world state downloader is complex and hard to clearly analyse because it was mixed in with the details of performing actions in WorldStateDownloader. This PR splits the state tracking out to a separate class so it can be tested separately and the synchronization is clearer and more consistent.

Each download uses a new instance of of the state class which ensures that any events which occurs after the download is cancelled and restarted still see the cancelled future and abort correctly.

This revealed and fixes a couple of issues:

  • The unavailable node that triggered a download to be aborted as stalled would be marked as failed after the download was aborted, which added it as a node to download when the next retry happened (thus the next download also failed).
  • When no peers were available it was possible for a lot of threads to wind up adding a WaitForNewPeer task. Only one thread should do that to limit the number of timers required.


public abstract class NodeDataRequest {
private static final Logger LOG = LogManager.getLogger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see LOG used, can this file be taken out of the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good spot.

});
}

private synchronized void doCancel(final Void result, final Throwable error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doShutdown? doCancel implies a failure state to me instead of a finished state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

&& !waitingForNewPeer;
}

public CompletableFuture<Void> getFuture() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either the method name or the field name should be renamed. I would have expected to get the internal future instead of tearDownFuture or I would have expected to call getTearDownFuture - a field rename would involve both fields being renamed. future -> downloadingFuture and getFuture -> getTearDownFuture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that did wind up weird...

*/
package tech.pegasys.pantheon.ethereum.eth.sync.worldstate;

enum WorldStateDownloadStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unused file? Looks to have initially been factored out of WorldStateDownloader but it is no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed.

assertThat(pendingRequests.isEmpty()).isTrue();
}

private Runnable mustNotBeCalled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting on my paranoid tin foil hat, do we have a test where mustNotBeCalled is called so we can assert the failure escapes the lambdas? Or maybe I'm being too paranoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say you were paranoid but Meredith had the same question. I know it works because I always check my tests fail but worth adding an explicit test.

Copy link
Contributor

@mbaxter mbaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor comments, but otherwise looks good


@Test
public void shouldStoreRootNodeBeforeReturnedFutureCompletes() {
future.thenAccept(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check the result of this future - throwing inside the accept lambda won't trigger a failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

*/
package tech.pegasys.pantheon.ethereum.eth.sync.worldstate;

enum WorldStateDownloadStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is unused now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}

public void whileAdditionalRequestCanBeSent(final Runnable action) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void whileAdditionalRequestCanBeSent(final Runnable action) {
public void whileAdditionalRequestsCanBeSent(final Runnable action) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

});
}

private synchronized void doCancel(final Void result, final Throwable error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to onShutdown or cleanup or something since it always runs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, went with cleanup.


public synchronized void markAsFailed(final Task<NodeDataRequest> task) {
if (!future.isDone()) {
task.markFailed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an fyi - if task.markFailed() is called after the queue is cleared, it doesn't requeue the failed task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's true. I know I had a test failing when I added it but it definitely seems fine now and all the code checks out. It's much nicer without it so removing...

@ajsutton ajsutton merged commit 8a4ca7d into PegaSysEng:master Feb 26, 2019
@ajsutton ajsutton deleted the separate-state2 branch February 26, 2019 22:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants