Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock while acquiring and releasing lock in HazelcastClusterManager in vertx 3.3.2 #41

Closed
pkadam1989 opened this issue Aug 4, 2016 · 2 comments
Assignees

Comments

@pkadam1989
Copy link

pkadam1989 commented Aug 4, 2016

public class Test {

public static void main (String[] args)  {
        Vertx.clusteredVertx(new VertxOptions(), result -> testFunction(result.result()));
}

private static void testFunction(Vertx vertx) {
    Context context = vertx.getOrCreateContext();
    context.runOnContext(v -> testFunctionOnContext(vertx,"r1"));
    context.runOnContext(v -> testFunctionOnContext(vertx,"r2"));
    context.runOnContext(v -> vertx.setTimer(20000L, event -> context.runOnContext(v1 -> testFunctionOnContext(vertx,"r3"))));
}


private static void testFunctionOnContext(Vertx vertx, String req) {
    System.out.println("************ TRY TO GET LOCK "+req +" ************");
    vertx.sharedData().getLockWithTimeout("abc",15000L, lockResult -> {
            if (lockResult.succeeded()) {
                    System.out.println("************ "+req +" GOT LOCK ************");
                    vertx.setTimer(10000L, event -> {
                        System.out.println("************ "+req +" TRYING TO RELEASE LOCK ************");
                        lockResult.result().release();
                    });
            }
            else{
                lockResult.cause().printStackTrace();
            }
        });
}

}

Using vertx 3.3.2

In the above case getLockWithTimeout and release call on lock both are using executeBlocking with ordering = true by default.
We are running on a single context.

So,
When r1 tries to get lock, it will get the lock for abc and will start doing its work, assume it does some work which takes 10 secs.

When r2 tries to get lock, simultaneously before, r1 finishes its work, r2 will not get a lock and will be waiting 15 secs for r1 to release the lock.

But r1 is waiting for r2 to get lock since it is executeBlocking with ordering true.

So r2 will never get the lock, and fail with timeout exception even though r1 had finished the work in 10 secs. After r2 fails with timeout exception r1 will release the lock, since it was waiting for r2 executeBlocking to get over.

This will work if we have ordering false in executeBlocking. Why is the ordering true by default, which makes it sequential execution of executeBlocking ?
In that case, I will not be able to use lock on single context, when there are multiple requests coming for the same resource.

@pkadam1989
Copy link
Author

This issue was not seen in Vertx version 3.2.1

pkadam1989 pushed a commit to pkadam1989/vertx-hazelcast that referenced this issue Aug 23, 2016
…d=true by default which enables sequential execution of executeBlocking. Changed the ordering to false, since it results in race condition in case we are running on same event loop due to sequential execution of acquiring and releasing lock.

Details have been provided in the issue, Deadlock while acquiring and releasing lock in HazelcastClusterManager in vertx 3.3.2 vert-x3#41
@vietj
Copy link
Contributor

vietj commented Sep 4, 2016

note of what using executeBlocking=false might improve : #43

tsegismont added a commit to tsegismont/vert.x that referenced this issue Dec 5, 2016
The existing clustered lock tests always involve two different Vert.x instances.
In vert-x3/vertx-hazelcast#41, it has been reported that concurrent requests to a clustered lock on the same instance can lead to deadlocking.
It happens because locks are acquired with ordered execute blocking, the same for releasing.

The fix consists in applying the technique used in jdbc client connections: a dedicated worker executor is created for a lock instance, then the lock is acquired in order, but released freely. Consequently, we never find ourselves in a situation where a lock release task is waiting for a lock acquire task to complete.

Pull requests for the cluster managers are ready and will be pushed soon.

Signed-off-by: Thomas Segismont <[email protected]>
tsegismont added a commit to tsegismont/vertx-hazelcast that referenced this issue Dec 5, 2016
…castClusterManager

The expected behavior is tested with new tests introduced in eclipse-vertx/vert.x#1732
@tsegismont tsegismont self-assigned this Dec 5, 2016
tsegismont added a commit that referenced this issue Jan 3, 2017
Fixes #41 Deadlock while acquiring and releasing lock in HazelcastClusterManager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants