-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X3: Catchup applications on missed superbufs upon wakeup #195
Conversation
18fc781
to
d761fc8
Compare
d761fc8
to
d2d2bb9
Compare
02e7a43
to
da804f7
Compare
I have pushed a new commit to this branch which should allow users to reload the onload kernel module without having to recompile their user space program. As such, there will be differing views of An alternative fix to this patch is to return |
2ec2eb6
to
01e3564
Compare
Superbufs are relinquished by userspace by posting them through a shared memory ringbuffer. When the ringbuffer is full, userspace now stores the unfreed superbuf ids in a linked list until there is space in the ringbuf.
Superbuffers previously were only able to be posted to apps if they lied between efhw_nic_efct_rxq->sbufs's added and removed pointers. These were typically only 4 apart meaning an app couldn't fall further than that without packet drops. Superbufs are now kept around longer and only removed once each app has received it or the nic is about to run out of donated superbufs. In the latter case, the sequence number of the apps will be advanced. When an application is destroyed, the superbufs it donated will be taken away and so another sleeping application will need to be skipped ahead. In the case where no buffers can be freed (all received buffers have been posted to the shared memory - particularly true for small rxqs) and a new one arrives through efct_buffer_start, ensure that on efct_buffer_end we free that same donated buffer so the nic doesn't starve.
Users should be able to reload the onload kernel module without having to recompile their user space program. As such, there will be differing views of `CI_EFCT_MAX_SUPERBUFS` from user and kernel space due to this patch. This case is now handled in the kernel module without returning an error.
01e3564
to
30e45ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I nearly approved of this. You might get away with fix-and-ship
The problem
There is a ringbuffer shared between kernel and userspace which is used to send superbufs (sbufs) from the kernel. This ringbuf is of size 16 and so if an app is paused it cannot take any sbufs from it and no more can be posted - this causes drops. Unfortunately, this ringbuf's size cannot be changed otherwise the ABI will be broken.
Similarly there is a free queue ringbuffer of the same size used to post buffers from the user side to the kernel. The difference being that if this one is filled completely, the application will abort.
Because of this, an application cannot fall asleep for long without either aborting or dropping a large number of packets.
Proposed solution
In general, the way around this is to hold on to the buffers until the shared memory queues are free to be pushed to.
Posting from userspace to kernelspace
The efct_rx_descriptor buffer contains an entry for each sbuf in an rxq and so a linked list of sbufs which are yet to be posted to the kernel can be maintained. The next time a superbuf is freed, it will try to post sbufs in this list also.
Posting from kernelspace to userspace
Superbufs are now kept around longer and only removed once each app has received it or the nic is about to run out of donated superbufs. In the latter case, the sequence number of the apps will be advanced.
When an application is destroyed, the superbufs it donated will be taken away and so another sleeping application will need to be skipped ahead.
In the case where no buffers can be freed (all received buffers have been posted to the shared memory - particularly true for small rxqs) and a new one arrives through efct_buffer_start, ensure that on efct_buffer_end we free that same donated buffer so the nic doesn't starve.
Testing
All tests verify that at their end, all sbufs have been freed with:
cat /proc/meminfo | grep HugePages_
. Tests with high throughput and multiple apps need to be run in release mode meaning onload is compiled withNDEBUG=1
otherwise there will be "no descriptor" drops on the NIC.It should be noted that applications can only fall back by at most the number of superbufs that have been donated to x3-net. The number of superbufs donated by each application is given by its rxq size. The number of superbufs it donates is given by
ceil((rxq_size * bytes_per_packet) / (bytes_per_superbuf))
wherebytes_per_packet=2KiB
andbytes_per_superbuf=2MiB
Note that an app is not able to buffer completely up to it's rxq size some of these same buffers need to be returned so the nic is not starved.
Well behaved apps
efsink enp1s0f1np3 udp:239.1.2.3:22000 &
efsink enp1s0f1np3 udp:239.1.2.3:22000
efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024
All packets received for both efsinks
Sleepy app
EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000
->Ctrl z
efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024
fg
Dut1 output:
Sleepy app - small rxq
EF_VI_RXQ_SIZE=512 efsink enp1s0f1np3 udp:239.1.2.3:22000
->Ctrl z
efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024
fg
Dut1 output:
Consume, fall behind and awake
EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000
efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024
Ctrl-z
EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000
efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024
Ctrl-c
,fg
The efsink that was put to sleep received
1,000,000
packets before it was put to sleep. When it was awoken at time6
It caught up on some of the buffers it missed leaving it with a total of 1061503 packets.Large catchup
EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000
and put them in the background withCtrl-Z
efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024
for i in {1..15}; do bg; done
Each application catches up by
943,680
packets because a large number of superbufs were donated to the nic. The rate of catchup is slow at:279Mb/s
due tonapi_poll
not being triggered frequently enough as there are no incoming packets on the nic. If there are incoming packets during catchup, the buffered superbufs will be posted much faster due to a more frequentnapi_poll
. If the same as above is done but packets are continuously sent when the applications wake up, the catchup is much faster and a bandwidth of~17.5Gb/s
is observed. This is on a 10 Gigabit nic so the~7.5Gb/s
bandwidth increase is due to the buffered superbufs.