Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zen2] Fix test failures in diff-based publishing #35684

Merged

Conversation

DaveCTurner
Copy link
Contributor

testIncompatibleDiffResendsFullState sometimes makes a 2-node cluster and
then partitions one of the nodes from the leader, which makes the leader stand
down. Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.

Additionally testDiffBasedPublishing sometimes fails if a publication is
delivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one. This change adds a freshness check to avoid this.

`testIncompatibleDiffResendsFullState` sometimes makes a 2-node cluster and
then partitions one of the nodes from the leader, which makes the leader stand
down.  Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.

Additionally `testDiffBasedPublishing` sometimes fails if a publication is
delivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one.  This change adds a freshness check to avoid this.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 19, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch November 19, 2018 06:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid throwing a CoordinationStateRejectedException in PublicationTransportHandler. The validation logic should be left to CoordinationState. We can and should improve the logic on when to cache the incoming cluster state though, to avoid old cluster states from poisoning the cache. I wonder if the simplest way to do this would be to only store the new state in lastSeenClusterState after it passes the call to handlePublishRequest.apply. WDYT?

@ywelsch ywelsch mentioned this pull request Nov 20, 2018
61 tasks
@DaveCTurner
Copy link
Contributor Author

I wonder if the simplest way to do this would be to only store the new state in lastSeenClusterState after it passes the call to handlePublishRequest.apply. WDYT?

D'oh of course that makes much more sense. Somehow I missed that we were passing it to the coordinator here. I've done that instead.

@DaveCTurner DaveCTurner requested a review from ywelsch November 21, 2018 16:52
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit cfdf666 into elastic:zen2 Nov 22, 2018
@DaveCTurner DaveCTurner deleted the 2018-11-19-fix-diff-based-publishing branch November 22, 2018 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants