Performance #10

M0dEx · 2023-07-08T21:26:45Z

The performance as of 0.1.6 is worse than expected.

Between two virtual machines on the same virtualized network (capable of about 30 Gbps of throughput), Quincy only manages:

~ 1.1 Gbps during Server -> Client data transfer
~ 1.3 Gbps during Client -> Server data transfer

with an MTU of 1400 bytes.

Server -> Client

Initial profiling did not yield anything suspicious, other than the fact that QuincyTunnel::process_inbound_traffic takes more time (had more samples) than QuincyTunnel::process_outbound_traffic during the Server -> Client data transfer, which is odd, as most of the data transfered should be going through QuincyTunnel::process_inbound_traffic.

The CPU usage on the Server virtual machine is also only 60 % balanced across all cores, which could mean either either too much IO, or that the Client is the bottleneck.

The CPU usage on the Client is much higher, in the 90s.

Server flamechart:

Client flamechart:

Client -> Server

Pretty much the same behaviour as above - QuincyClient::process_inbound_traffic takes more time than QuincyClient::process_outbound_traffic, which is, again, suspicious.

The CPU usage on the Server side is above 90 %, on the Client side only ~ 70 %.

Server flamechart:

Client flamechart:

Initial conclusions

It seems that the CPU usage on the receiving side is quite high, and that the receiving side spends more time in their respective process_inbound_traffic methods, which is highly suspicious (most of the data transfered should be handled by the respective process_outbound_traffic methods, at least that is my initial assumption).

Further investigation will be needed as to where Quincy client and server spend too much time.

TODO

Test the same scenario with lower and higher MTU
Find the culprit behind suspicious ratio of CPU time spent between process_inbound_traffic and process_outbound_traffic

The text was updated successfully, but these errors were encountered:

M0dEx · 2023-07-08T21:49:10Z

https://tailscale.com/blog/throughput-improvements/

and

https://tailscale.com/blog/more-throughput/

might be useful in regards to optimizing TUN performance, which seems to be problem at the moment (a lot of time spent in poll_write for the TUN interface).

The changes Tailscale made to wireguard-go are available here:
https://github.com/WireGuard/wireguard-go/blob/master/tun/tcp_offload_linux.go

M0dEx · 2023-07-09T13:07:13Z

Different MTUs

With an MTU of 6000, the throughput nearly triples, to about ~ 3 Gbps regardless of data transfer direction.

From the flamecharts, it is clear that more CPU time is spent encrypting the packets, but most of the time is still spent in poll_write for the TUN interfaces.

The CPU usage also decreased to about 60 - 70 % on both Server and Client.

Server -> Client

Server flamechart:

Client flamechart:

Client -> Server

Server flamechart:

Client flamechart:

M0dEx · 2024-02-07T23:08:02Z

GSO/GRO support is work-in-progress: ssrlive/rust-tun#45

frankozland · 2024-08-04T22:27:39Z

related?
https://users.rust-lang.org/t/zero-copy-async-io-in-rust/106996/3

M0dEx · 2024-08-05T08:33:49Z

I can try if io_uring would improve the performance on Linux; it might require some nontrivial changes in the TUN library, but it could be worth a shot.

M0dEx added enhancement New feature or request help wanted Extra attention is needed labels Jul 8, 2023

M0dEx self-assigned this Jul 8, 2023

This was referenced Jul 9, 2023

GRO/GSO on Linux meh/rust-tun#62

Open

Unnecessary Bytes copy #9

Closed

M0dEx added the priority-medium Medium priority issue label Aug 13, 2023

M0dEx added the performance Throughput and/or latency issue label Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #10

Performance #10

M0dEx commented Jul 8, 2023 •

edited

Loading

M0dEx commented Jul 8, 2023 •

edited

Loading

M0dEx commented Jul 9, 2023 •

edited

Loading

M0dEx commented Feb 7, 2024

frankozland commented Aug 4, 2024

M0dEx commented Aug 5, 2024 •

edited

Loading

Performance #10

Performance #10

Comments

M0dEx commented Jul 8, 2023 • edited Loading

Server -> Client

Client -> Server

Initial conclusions

TODO

M0dEx commented Jul 8, 2023 • edited Loading

M0dEx commented Jul 9, 2023 • edited Loading

Different MTUs

Server -> Client

Client -> Server

M0dEx commented Feb 7, 2024

frankozland commented Aug 4, 2024

M0dEx commented Aug 5, 2024 • edited Loading

M0dEx commented Jul 8, 2023 •

edited

Loading

M0dEx commented Jul 8, 2023 •

edited

Loading

M0dEx commented Jul 9, 2023 •

edited

Loading

M0dEx commented Aug 5, 2024 •

edited

Loading