Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix racy use of ConcurrentHashMap #36603

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Dec 13, 2018

ConcurrentHashMap does not always behave correctly if removing elements and
concurrently checking for its emptyiness. Work around this in PeerFinder by
protecting all usages with a mutex (there was only one usage unprotected by the
mutex anyway) and then we don't even need a ConcurrentHashMap at all.

ConcurrentHashMap does not always behave correctly if removing elements and
concurrently checking for its emptyiness. Work around this by protecting all
usages with a mutex (there was only one usage unprotected by the mutex anyway)
and then we don't even need a ConcurrentHashMap at all.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Dec 13, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch December 13, 2018 15:52
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Dec 13, 2018

The racy use of ConcurrentHashMap is demonstrated by the failure of the following test:

public void testConcurrentMapIsEmptyOnConcurrentRemovals() throws InterruptedException {
    final Map<String, String> map = newConcurrentMap();
    final Thread[] threads = new Thread[2];
    final CyclicBarrier cyclicBarrier = new CyclicBarrier(threads.length, () -> map.put("foo", "bar"));

    for (int threadIndex = 0; threadIndex < threads.length; threadIndex++) {
        threads[threadIndex] = new Thread(() -> {
            for (int iteration = 0; iteration < 100000; iteration++) {
                try {
                    cyclicBarrier.await(1, TimeUnit.SECONDS);
                } catch (InterruptedException | BrokenBarrierException | TimeoutException e) {
                    throw new AssertionError("unexpected exception at " + iteration, e);
                }
                map.remove("foo");
                assertTrue("expected empty at " + iteration, map.isEmpty());
            }
        }, "remover-" + threadIndex);
    }

    for (Thread thread : threads) {
        thread.start();
    }

    for (Thread thread : threads) {
        thread.join();
    }
}

for (final Peer peer : peersByAddress.values()) {
peersRemoved = peer.handleWakeUp() || peersRemoved; // care: avoid short-circuiting, each peer needs waking up
final List<TransportAddress> peersAddressesToRemove = new ArrayList<>();
for (final Entry<TransportAddress, Peer> addressAndPeer : peersByAddress.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe simpler to just write

final boolean peersRemoved = peersByAddress.values().removeIf(Peer::handleWakeUp);

and then further below just

return peersRemoved;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very slick. I pushed 96b02b0.

@DaveCTurner DaveCTurner requested a review from ywelsch December 14, 2018 10:02
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 44ba9ab into elastic:master Dec 14, 2018
@DaveCTurner DaveCTurner deleted the 2018-12-13-fix-racy-use-of-concurrenthashmap branch December 14, 2018 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants