-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Fix test failures in diff-based publishing #35684
[Zen2] Fix test failures in diff-based publishing #35684
Conversation
`testIncompatibleDiffResendsFullState` sometimes makes a 2-node cluster and then partitions one of the nodes from the leader, which makes the leader stand down. Then when the partition is removed the cluster re-forms but does so by sending full cluster states, not diffs, causing the test to fail. Additionally `testDiffBasedPublishing` sometimes fails if a publication is delivered out-of-order, wiping out a fresher last-received cluster state with a less-fresh one. This change adds a freshness check to avoid this.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid throwing a CoordinationStateRejectedException in PublicationTransportHandler. The validation logic should be left to CoordinationState. We can and should improve the logic on when to cache the incoming cluster state though, to avoid old cluster states from poisoning the cache. I wonder if the simplest way to do this would be to only store the new state in lastSeenClusterState
after it passes the call to handlePublishRequest.apply
. WDYT?
D'oh of course that makes much more sense. Somehow I missed that we were passing it to the coordinator here. I've done that instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
testIncompatibleDiffResendsFullState
sometimes makes a 2-node cluster andthen partitions one of the nodes from the leader, which makes the leader stand
down. Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.
Additionally
testDiffBasedPublishing
sometimes fails if a publication isdelivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one. This change adds a freshness check to avoid this.