Bump default Mplex split_send_size to 64Kbyte #802

dvdplm · 2018-12-20T10:43:11Z

The default value of 1Kbyte is rather small and may impact throughput adversly. This PR proposes 64Kbyte. As a comparison, the Yamux default receive_window size is 256Kbyte.

tomaka · 2018-12-20T13:59:18Z

The idea in splitting packets in small sizes is to potentially improve interlacing of substreams. In practice though interlacing will most likely not happen because this is a half-baked change.

However I'd honestly prefer to figure out why mplex is so slow first, before merging any PR that will make the slowness disappear.

dvdplm · 2018-12-20T15:46:31Z

will most likely not happen because this is a half-baked change.

Can you elaborate on this? Not sure what you mean.

However I'd honestly prefer to figure out why mplex is so slow first

I'd like to dig deeper too, but I think the test I've used so far is too lacking and we need some proper benchmarks. Note that the mplex slowness in reading data stands out in debug mode; in release mode it looks fine and reads and writes are in the same order of magnitude of each other. To me it is hard to justify spending lots of time on a performance problem that only shows up in debug mode.

tomaka · 2018-12-20T17:23:02Z

Can you elaborate on this? Not sure what you mean.

Right now if you send one packet of 1MB on substream A, and one packet of 1MB on substream B, the multiplexer will send the entire 1MB of substream A followed with the 1MB packet on substream B.

By splitting packets into smaller packets of 1kB, the idea was to send the first kilobyte of substream A, followed with the first kilobyte of substream B, followed with the second kilobyte of substream A, the second kilobyte of substream B, and so on.

However we only do the splitting, and nothing actually interleaves the packets because it hasn't been implemented. In other words, we will send 1024 packets of substream A followed with 1024 packets of substream B.

To me it is hard to justify spending lots of time on a performance problem that only shows up in debug mode.

I don't remember the figures you gave me, but it was something like 15ms to transfer 1MB in release mode. This is way too much in my opinion.

dvdplm · 2018-12-21T08:22:10Z

I don't remember the figures you gave me, but it was something like 15ms to transfer 1MB in release mode. This is way too much in my opinion.

Ok, that is good info – I honestly have no strong opinion on what "fast" or "slow" is.

Here are some new numbers after yesterdays debugging&fixing things (using yet unreleased yamux) obtained using the elapsed crate's measure_time() around the reader/writer Future. Here I send 7Mbyte (compiled in release mode):

Mplex, 1024byte chunks:

[test, writer] Running the writer future took 556.02 ms
[test, reader] Running the reader future took 563.98 ms

mplex, 8Kbyte:

[test, writer] Running the writer future took 80.27 ms
[test, reader] Running the reader future took 84.02 ms

mplex, 64Kbyte:

[test, writer] Running the writer future took 18.29 ms
[test, reader] Running the reader future took 18.68 ms

mplex, 256Kbyte:

[test, writer] Running the writer future took 10.75 ms
[test, reader] Running the reader future took 10.83 ms

Yamux, receive_window at 1Mbyte

[test, writer] Running the writer future took 40.91 ms
[test, reader] Running the reader future took 44.38 ms

Yamux, receive_window at 256Kbye (default):

[test, writer] Running the writer future took 45.53 ms
[test, reader] Running the reader future took 45.59 ms

It's interesting to see that the read/write asymmetry disappears in release mode; it is also somewhat surprising to me that Yamux seems to have a higher overhead.

tomaka · 2018-12-22T10:08:56Z

I don't think you're using secio, but another "obvious" problem that should be looked at is that secio does an encryption and hmac round for every single chunk of data.

I'll investigate these performance issues when I'm back from vacation, if nobody does before.

tomaka · 2018-12-28T13:15:57Z

I added some benchmarks to mplex on my side.
One my machine, sending one kB of data takes around 160µs, sending one MB of data takes around 1.3ms, and sending two MBs takes around 2.2ms.

Most notably, changing the split_send_size (even to 1MB) doesn't change anything.

dvdplm · 2018-12-28T18:58:15Z

I added some benchmarks to mplex on my side.

Can you share the benchmark somewhere so I can check on my side too?

dvdplm · 2018-12-28T23:34:04Z

I ran tomaka/benches-mplex on my side:

running 4 tests
test connect_and_send_hello  ... bench:     330,046 ns/iter (+/- 558,511)
test connect_and_send_one_kb ... bench:     321,593 ns/iter (+/- 217,815)
test connect_and_send_one_mb ... bench:  11,599,293 ns/iter (+/- 4,913,540)
test connect_and_send_two_mb ... bench:  46,662,473 ns/iter (+/- 8,849,963)

Does that jive with what you're seeing too? Do you also have the same kind of wild variations on your machine?

Increasing the split_send_size to 64Kbyte I get this:

running 4 tests
test connect_and_send_hello  ... bench:   2,091,373 ns/iter (+/- 1,790,809)
test connect_and_send_one_kb ... bench:   2,552,111 ns/iter (+/- 2,601,211)
test connect_and_send_one_mb ... bench:   5,915,998 ns/iter (+/- 3,047,250)
test connect_and_send_two_mb ... bench:   7,749,807 ns/iter (+/- 1,694,218)

Again the variability is decidedly weird, especially for the smaller payloads I get numbers between 300 000 ns and 2 million, but the speed-up for larger payloads is consistent. Not what you're seeing I take it?

If I switch to using the multi-threaded tokio Runtime and 64Kbyte chunks I get better values still (and less crazy error ranges):

test connect_and_send_hello  ... bench:     460,012 ns/iter (+/- 192,313)
test connect_and_send_one_kb ... bench:     440,164 ns/iter (+/- 693,928)
test connect_and_send_one_mb ... bench:   1,520,596 ns/iter (+/- 480,530)
test connect_and_send_two_mb ... bench:   3,481,443 ns/iter (+/- 1,380,928)

romanb · 2020-11-12T12:15:11Z

Superseded by #1834.

Bump default Mplex split_send_size to 64Kbyte

f4585d2

ghost assigned dvdplm Dec 20, 2018

ghost added the in progress label Dec 20, 2018

twittner approved these changes Dec 20, 2018

View reviewed changes

dvdplm added pr-needs-review and removed in progress labels Dec 27, 2018

gnunicorn added the pr-needs-rebase label Feb 11, 2019

Merge branch 'master' into dp/chore/increase-mplex-default_split_size

8676762

ghost added the in progress label Apr 4, 2019

dvdplm removed pr-needs-rebase labels Apr 4, 2019

This was referenced Nov 10, 2020

[plaintext] Retain remaining read buffer after handshake. #1831

Merged

[mplex] Add benchmark, tweak default split_send_size. #1834

Merged

romanb closed this Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump default Mplex split_send_size to 64Kbyte #802

Bump default Mplex split_send_size to 64Kbyte #802

dvdplm commented Dec 20, 2018

tomaka commented Dec 20, 2018 •

edited

Loading

dvdplm commented Dec 20, 2018

tomaka commented Dec 20, 2018 •

edited

Loading

dvdplm commented Dec 21, 2018 •

edited

Loading

tomaka commented Dec 22, 2018 •

edited

Loading

tomaka commented Dec 28, 2018

dvdplm commented Dec 28, 2018

dvdplm commented Dec 28, 2018 •

edited

Loading

romanb commented Nov 12, 2020

Bump default Mplex split_send_size to 64Kbyte #802

Bump default Mplex split_send_size to 64Kbyte #802

Conversation

dvdplm commented Dec 20, 2018

tomaka commented Dec 20, 2018 • edited Loading

dvdplm commented Dec 20, 2018

tomaka commented Dec 20, 2018 • edited Loading

dvdplm commented Dec 21, 2018 • edited Loading

tomaka commented Dec 22, 2018 • edited Loading

tomaka commented Dec 28, 2018

dvdplm commented Dec 28, 2018

dvdplm commented Dec 28, 2018 • edited Loading

romanb commented Nov 12, 2020

tomaka commented Dec 20, 2018 •

edited

Loading

tomaka commented Dec 20, 2018 •

edited

Loading

dvdplm commented Dec 21, 2018 •

edited

Loading

tomaka commented Dec 22, 2018 •

edited

Loading

dvdplm commented Dec 28, 2018 •

edited

Loading