Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io_uring performance #1

Closed
wegul opened this issue May 16, 2024 · 4 comments
Closed

io_uring performance #1

wegul opened this issue May 16, 2024 · 4 comments

Comments

@wegul
Copy link

wegul commented May 16, 2024

I ran vanilla "./netbench" in liburing-2.4 and found epoll has better xput and latency comparing to io_uring. I then noticed this is discussed in Issue536. But I still hope there could be some explanations to these results, as I believe io_uring is marketed to be more efficient. For example, in some other tests, although epoll still outperforms io_uring, io_uring has a lot more idles. I wonder if we could trade these idle cycles to better performance?

Here's the result:

$ ./netbench
[ 0.000] running epoll for io_uring cfg=
[ 2.083] io_uring port=11383: rps:128.13k Bps: 9.23M idle=0ms user=580ms system=4540ms wall=1030ms loops=16 overflows=16 read_per_loop: p10=0 p50=0 p90=8192 avg=1058.83
[ 3.087] io_uring port=11383: rps:130.55k Bps: 9.40M idle=0ms user=590ms system=4430ms wall=1000ms loops=16 overflows=16 read_per_loop: p10=8192 p50=8192 p90=8192 avg=8192.00
[ 4.093] io_uring port=11383: rps:130.29k Bps: 9.38M idle=0ms user=580ms system=4440ms wall=1010ms loops=16 overflows=16 read_per_loop: p10=8192 p50=8192 p90=8192 avg=8192.00
[ 5.066] ...done sender
[ 5.168] ...done receiver
[ 5.168] running epoll for epoll cfg=
[ 7.169] epoll port=11384: rps:139.65k Bps: 10.05M idle=0ms user=610ms system=4390ms wall=1000ms loops=4364 overflows=0 read_per_loop: p10=32 p50=32 p90=32 avg=31.75
[ 8.169] epoll port=11384: rps:138.40k Bps: 9.96M idle=0ms user=610ms system=4400ms wall=1000ms loops=4325 overflows=0 read_per_loop: p10=32 p50=32 p90=32 avg=32.00
[ 9.169] epoll port=11384: rps:138.78k Bps: 9.99M idle=0ms user=590ms system=4400ms wall=1000ms loops=4337 overflows=0 read_per_loop: p10=32 p50=32 p90=32 avg=32.00
[ 10.169] epoll port=11384: rps:138.43k Bps: 9.97M idle=0ms user=630ms system=4380ms wall=1000ms loops=4326 overflows=0 read_per_loop: p10=32 p50=32 p90=32 avg=32.00
[ 10.227] ...done sender
[ 10.228] ...done receiver
[ 10.228] tx:epoll rx:io_uring
[ 10.228] packetsPerSecond= 139k bytesPerSecond= 10M connectErrors=0 sendErrors=0 recvErrors=0 connects=256 latency={p95=1359us p50=924us avg=1014us p100=2465620us count=698889}
[ 10.228] tx:epoll rx:epoll
[ 10.228] packetsPerSecond= 143k bytesPerSecond= 10M connectErrors=0 sendErrors=0 recvErrors=0 connects=256 latency={p95=1036us p50=938us avg=943us p100=1575336us count=719454}

@DylanZA
Copy link
Owner

DylanZA commented May 17, 2024

what kernel are you running on?

i think the default options might not be optimal to be honest.

things I think you want:

  • --defer_taskrun critically
  • maybe a big --cqe_count . 16000 perhaps. yuo can see this if there are overflows > 0

IIRC it definitely should be quicker than epoll but perhaps latest kernel versions require different tuning

@wegul
Copy link
Author

wegul commented May 17, 2024

Thank you. The options worked. My kernel is 6.5.0-28-generic.

$ ./netbench --rx "io_uring --cqe_count 16000 --defer_taskrun true" --rx "epoll" --time 10
...... (no overflows) ......
[ 20.141] tx:epoll rx:io_uring cqe_count=16000 defer_taskrun=1
[ 20.141] packetsPerSecond= 146k bytesPerSecond= 10M connectErrors=0 sendErrors=0 recvErrors=0 connects=256 latency={p95=952us p50=909us avg=985us p100=3840024us count=1.46646e+06}
[ 20.141] tx:epoll rx:epoll
[ 20.141] packetsPerSecond= 138k bytesPerSecond= 10M connectErrors=0 sendErrors=0 recvErrors=0 connects=256 latency={p95=1046us p50=950us avg=970us p100=1173115us count=1.38866e+06}

Could you explain a bit what happened here? Is defer_taskrun and napi_polling doing the same thing?

@DylanZA
Copy link
Owner

DylanZA commented May 17, 2024

napi_polling I do not believe is even in the kernel yet - I added it as an experiment. I doubt it makes much difference

defer_taskrun makes networking of small packets using io_uring much more performant. see https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023

@DylanZA DylanZA closed this as completed May 17, 2024
@axboe
Copy link
Collaborator

axboe commented May 17, 2024

napi_polling I do not believe is even in the kernel yet - I added it as an experiment. I doubt it makes much difference

It's in the 6.9 kernel, it did finally land. Not super relevant here, just wanted to mention it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants