Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vertx don't instant sends messages while processing handlers! #1607

Closed
tetv opened this issue Sep 4, 2016 · 6 comments
Closed

vertx don't instant sends messages while processing handlers! #1607

tetv opened this issue Sep 4, 2016 · 6 comments

Comments

@tetv
Copy link

tetv commented Sep 4, 2016

It seems that vertx only sends cluster messages when is idle (nothing to process).
That means while a vertx is very busy/active there is no messages dispatched to eventBus until the vertx becomes completely idle, that could be a big limitation if there is a need to send instant messages.

What is the solution to force vertx to instant send a message without waiting the need to be idle?

In order to demonstrate this issue, I have created two very simple verticles:

  • Verticles ping: sends 60 instant messages to pong and prints the reply answers.
  • Verticle pong: emulates a 50ms processing time and reply the message

Scenarios: (cluster mode)

  • Test1: pong is a worker verticle:
  • Test2: pong is a multithread worker verticle:
java -cp 'target/*' io.vertx.core.Launcher start my.pacakge.Pong -id pong -worker -cluster
java -cp 'target/*' io.vertx.core.Launcher start my.pacakge.Ping -id ping -worker -cluster

For these tests ping verticle is a worker verticle ([w-##] means vert.x-worker-thread-##), however it's not revenant if it's a standard verticle because both configurations are similar. Both ping and pong suffers from the reported issue and therefore the messages were stuck on eventBus until idle (nothing to process).

Note that If the ping used a multithread worker then the 1 second extra stuff wasn't noticed and the entire process was 1 second quicker (see ping code), however the pong continues suffering from the reported issue.

ping verticle code:

public class Ping extends AbstractVerticle {
    private final static Logger LOG = LoggerFactory.getLogger(Ping.class);

    @Override
    public void start() {
        for (int i = 1; i <= 60; i++) {
            String ball = String.format("ball%02d", i);
            getVertx().eventBus().send("table", ball, response -> {
                if (response.succeeded()) {
                    LOG.info("ping> recv " + response.result().body());
                } else {
                    LOG.error("ping> fail " + response.cause());
                }
            });
        }
        LOG.info("ping> send ball01 to ball60");
        doExtraStuff(); // 1 second wait
    }

    private static void doExtraStuff() {
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
    }
}

pong verticle code:

public class Pong extends AbstractVerticle {
    private final static Logger LOG = LoggerFactory.getLogger(Pong.class);

    @Override
    public void start() {
        getVertx().eventBus().consumer("table", message -> {
            String ball = (String)message.body();
            LOG.info("pong> recv " + ball);
            process(); // 50ms wait
            LOG.info("pong> send " + ball);
            message.reply(ball);
        });
    }

    private static void process() {
        try {
            Thread.sleep(50);
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
    }
}
  • Test1 results: ping and pong are worker verticle:
[2016-09-04 02:12:23.169] [w-02] ping> send ball01 to ball60 

# -> (ping is doing extra sfuff for 1 second and the 60 messages aren't dispatched to pong)

[2016-09-04 02:12:24.294]                              [w-03] pong> recv ball01 
[2016-09-04 02:12:24.349]                              [w-03] pong> send ball01 
[2016-09-04 02:12:24.364]                              [w-03] pong> recv ball02 
[2016-09-04 02:12:24.415]                              [w-03] pong> send ball02 
#... (2x57 lines - recv and send for ball03 to ball59 - time between recv and send is ~50ms)
[2016-09-04 02:12:27.392]                              [w-03] pong> recv ball60 
[2016-09-04 02:12:27.443]                              [w-03] pong> send ball60 

# -> (pong is idle and only now the 60 messages are dispatched to ping)
# -> (ping didn't receive answers for at least 3 seconds; pong processing time is 50ms!)

[2016-09-04 02:12:27.463] [w-04] ping> recv ball01 
[2016-09-04 02:12:27.465] [w-04] ping> recv ball02 
#... (57 lines - recv for ball03 to ball59)
[2016-09-04 02:12:27.504] [w-04] ping> recv ball60 

# -> (ping received the 60 reply messages practically in one go!)
  • Test2 results: ping is a worker verticle and pong is a multithread worker verticle:
[2016-09-04 02:12:37.623] [w-02] ping> send ball01 to ball60 

# -> (ping is doing extra sfuff for 1 second and the 60 messages aren't dispatched to pong)

[2016-09-04 02:12:38.754]                              [w-12] pong> ball01 recv 
[2016-09-04 02:12:38.754]                              [w-13] pong> ball02 recv 
#... (17 lines - recv for random bal01 to bal20 except ball01, ball02 and ball18)
[2016-09-04 02:12:38.760]                              [w-09] pong> ball18 recv 

# -> (pong doesn't have more idle threads - all 20 pool threads are processing for 50ms)

[2016-09-04 02:12:38.805]                              [w-13] pong> ball02 send 
[2016-09-04 02:12:38.805]                              [w-14] pong> ball03 send 
[2016-09-04 02:12:38.805]                              [w-13] pong> ball03 recv
#... (2x18 lines - with random send/recv bal01 to bal20 except already displayed balls)
[2016-09-04 02:12:38.835]                              [w-12] pong> ball40 recv 

# -> (pong doesn't have more idle threads - all 20 pool threads are processing for 50ms)

[2016-09-04 02:12:38.872]                              [w-16] pong> ball26 send 
[2016-09-04 02:12:38.872]                              [w-06] pong> ball27 send 
[2016-09-04 02:12:38.873]                              [w-16] pong> ball41 recv 
#... (2x17 lines - with random send/recv bal21 to bal40 except already displayed balls)
[2016-09-04 02:12:38.882]                              [w-02] pong> ball59 recv 
[2016-09-04 02:12:38.887]                              [w-14] pong> ball40 send 
[2016-09-04 02:12:38.888]                              [w-14] pong> ball60 recv 

# -> (pong doesn't have more idle threads - all 20 pool threads are processing for 50ms)

[2016-09-04 02:12:38.925]                              [w-16] pong> ball41 send 
[2016-09-04 02:12:38.926]                              [w-19] pong> ball44 send 
#... (17 lines - send for random bal41 to bal60 except ball41, ball44 and ball60)
[2016-09-04 02:12:38.938]                              [w-14] pong> ball60 send 

# -> (pong has idle threads and only the 60 messages are dispatched to ping)
# -> (ping didn't receive answers for at least 150ms seconds; pong processing time is 50ms!)

[2016-09-04 02:12:38.945] [w-04] ping> recv ball09 
[2016-09-04 02:12:38.959] [w-04] ping> recv ball04 
#... (57 lines - recv for random bal01 to bal60 except ball09, ball04 and ball44)
[2016-09-04 02:12:39.004] [w-04] ping> recv ball44 

# -> (ping received the 60 reply messages practically in one go!)

What is the solution to force vertx to instant send a message without waiting the need to be idle?

@tetv tetv changed the title vertx don't instant sends messages while processing handles! vertx don't instant sends messages while processing handlers! Sep 4, 2016
@vietj
Copy link
Member

vietj commented Sep 4, 2016

Hi,

with a worker it happens because Hazelcast uses an executeBlocking method that cannot be executed until the current task (the worker) has been finished (as it uses ordering).

can you rather open this issue in vertx-hazelcast project ?

@tetv
Copy link
Author

tetv commented Sep 4, 2016

Done, opened.
However it happens as well if using a standard verticle instead of a worker.

@vietj
Copy link
Member

vietj commented Sep 4, 2016

In a standard verticle you must never block, that's the rule.

@vietj vietj closed this as completed Sep 4, 2016
@tetv
Copy link
Author

tetv commented Sep 4, 2016

Is 50ms processing time considered a block?

@vietj
Copy link
Member

vietj commented Sep 4, 2016

if you block a task, other tasks on the same event loop cannot execute, among these tasks some will send the message through the socket, no matter how long it blocks, they simply cannot execute. when you send a message in event bus, tasks are scheduled on the same thread to write the message in the connection

@tetv
Copy link
Author

tetv commented Sep 4, 2016

Thank you @vietj, but it's not clear too me who is the responsibility of sending the messages to the eventBus, and how do you define never block a thread.

  1. Are always the eventloop threads that has the responsibility to dispatch the messages to the eventBus? (Even in case of a worker ´verticles`?).

2) Can we define never block like: [num-expected-requests] x [average-duration] < 99% of the time? Or should be 100%? Maybe would be better using 95% or 80%? (time in this case could be defined as any particular second).

I clear understand the diferences between the three types of verticle, however, it's not clear in the documentation what does means -effectively- and -quantitatively- block the thread (violation the golden rule) - this is why I am asking question 2).

In my understanding, we should promote a verticle to be worker when an handles is very busy/solicited (like several requests per second). The processing time (lets say < 100ms) is not so relevant unless there is a lot of requests coming in for that handler. Of course if the processing time of an handler is in seconds, by itself it's relevant, but being 50ms, 10ms or 1ms, it's not so relevant for a handler that is consumed occasionally. 3) Do you agree with that?

The documentation it answers in -some- way the questions 2) and 3):

If you have a single event loop, and you want to handle 10000 http requests per second, then it’s clear that each request can’t take more than 0.1 ms to process, so you can’t block for any more time than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants