-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump default Mplex split_send_size to 64Kbyte #802
Bump default Mplex split_send_size to 64Kbyte #802
Conversation
The idea in splitting packets in small sizes is to potentially improve interlacing of substreams. In practice though interlacing will most likely not happen because this is a half-baked change. However I'd honestly prefer to figure out why mplex is so slow first, before merging any PR that will make the slowness disappear. |
Can you elaborate on this? Not sure what you mean.
I'd like to dig deeper too, but I think the test I've used so far is too lacking and we need some proper benchmarks. Note that the mplex slowness in reading data stands out in debug mode; in release mode it looks fine and reads and writes are in the same order of magnitude of each other. To me it is hard to justify spending lots of time on a performance problem that only shows up in debug mode. |
Right now if you send one packet of 1MB on substream A, and one packet of 1MB on substream B, the multiplexer will send the entire 1MB of substream A followed with the 1MB packet on substream B. By splitting packets into smaller packets of 1kB, the idea was to send the first kilobyte of substream A, followed with the first kilobyte of substream B, followed with the second kilobyte of substream A, the second kilobyte of substream B, and so on. However we only do the splitting, and nothing actually interleaves the packets because it hasn't been implemented. In other words, we will send 1024 packets of substream A followed with 1024 packets of substream B.
I don't remember the figures you gave me, but it was something like 15ms to transfer 1MB in release mode. This is way too much in my opinion. |
Ok, that is good info – I honestly have no strong opinion on what "fast" or "slow" is. Here are some new numbers after yesterdays debugging&fixing things (using yet unreleased yamux) obtained using the Mplex, 1024byte chunks:
mplex, 8Kbyte:
mplex, 64Kbyte:
mplex, 256Kbyte:
Yamux, receive_window at 1Mbyte
Yamux, receive_window at 256Kbye (default):
It's interesting to see that the read/write asymmetry disappears in release mode; it is also somewhat surprising to me that Yamux seems to have a higher overhead. |
I don't think you're using secio, but another "obvious" problem that should be looked at is that I'll investigate these performance issues when I'm back from vacation, if nobody does before. |
I added some benchmarks to Most notably, changing the |
Can you share the benchmark somewhere so I can check on my side too? |
I ran tomaka/benches-mplex on my side:
Does that jive with what you're seeing too? Do you also have the same kind of wild variations on your machine? Increasing the
Again the variability is decidedly weird, especially for the smaller payloads I get numbers between 300 000 ns and 2 million, but the speed-up for larger payloads is consistent. Not what you're seeing I take it? If I switch to using the multi-threaded tokio Runtime and 64Kbyte chunks I get better values still (and less crazy error ranges):
|
Superseded by #1834. |
The default value of 1Kbyte is rather small and may impact throughput adversly. This PR proposes 64Kbyte. As a comparison, the Yamux default
receive_window
size is 256Kbyte.