-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator balancer move then drop fix #5528
Coordinator balancer move then drop fix #5528
Conversation
@clintropolis thanks for working on this. This PR contains some changes to use lambda and I think it makes me difficult to figure out which parts are bug fixes. I would say it's better to split this PR into two sub PRs for bug fix only and remaining. |
Sorry for mixing stuff, this was already part of a split up chain of commits that originally also included the stuff in #5529. Main bug fix is here and additions to test to test the original bug are some additions here and the test itself I can split this stuff out if you prefer, I was just lazily updating stuff as I came across it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @clintropolis. Left some comments.
sourceLoadQueueChildrenCache.start(); | ||
destinationLoadQueueChildrenCache.start(); | ||
|
||
Thread.sleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause test failures in Travis. If this is to wait for childrenCaches to be ready, some kind of real check (by polling or using a latch) should be used instead. Same for other sleeps.
tearDownServerAndCurator(); | ||
} | ||
|
||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a timeout to avoid too long waits even when this test becomes broken in the future.
loadManagementPeons.put("localhost:1", loadQueuePeon); | ||
loadManagementPeons.put("localhost:2", loadQueuePeon2); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary line. Please remove other unnecessary lines too.
* apache#5521 part 1 * formatting * oops * less magic tests
* apache#5521 part 1 * formatting * oops * less magic tests
* apache#5521 part 1 * formatting * oops * less magic tests
* apache#5521 part 1 * formatting * oops * less magic tests
Fixes issue where coordinator balancer will move a segment to the destination server but never actually drop the segment from the source server, one of the issues described in #5521.
With this bug in place, the effective behavior of segment movement was that the balancer would copy the segment to where it thought it needed to be, and the load rule would then decide from where to drop. Because the drop here was never previously working, this may be worth discussion of any implications.