X3: Catchup applications on missed superbufs upon wakeup #195

ligallag-amd · 2023-12-13T11:38:33Z

The problem

There is a ringbuffer shared between kernel and userspace which is used to send superbufs (sbufs) from the kernel. This ringbuf is of size 16 and so if an app is paused it cannot take any sbufs from it and no more can be posted - this causes drops. Unfortunately, this ringbuf's size cannot be changed otherwise the ABI will be broken.

Similarly there is a free queue ringbuffer of the same size used to post buffers from the user side to the kernel. The difference being that if this one is filled completely, the application will abort.

Because of this, an application cannot fall asleep for long without either aborting or dropping a large number of packets.

Proposed solution

In general, the way around this is to hold on to the buffers until the shared memory queues are free to be pushed to.

Posting from userspace to kernelspace

The efct_rx_descriptor buffer contains an entry for each sbuf in an rxq and so a linked list of sbufs which are yet to be posted to the kernel can be maintained. The next time a superbuf is freed, it will try to post sbufs in this list also.

Posting from kernelspace to userspace

Superbufs are now kept around longer and only removed once each app has received it or the nic is about to run out of donated superbufs. In the latter case, the sequence number of the apps will be advanced.

When an application is destroyed, the superbufs it donated will be taken away and so another sleeping application will need to be skipped ahead.

In the case where no buffers can be freed (all received buffers have been posted to the shared memory - particularly true for small rxqs) and a new one arrives through efct_buffer_start, ensure that on efct_buffer_end we free that same donated buffer so the nic doesn't starve.

Testing

All tests verify that at their end, all sbufs have been freed with: cat /proc/meminfo | grep HugePages_. Tests with high throughput and multiple apps need to be run in release mode meaning onload is compiled with NDEBUG=1 otherwise there will be "no descriptor" drops on the NIC.

It should be noted that applications can only fall back by at most the number of superbufs that have been donated to x3-net. The number of superbufs donated by each application is given by its rxq size. The number of superbufs it donates is given by ceil((rxq_size * bytes_per_packet) / (bytes_per_superbuf)) where bytes_per_packet=2KiB and bytes_per_superbuf=2MiB

Note that an app is not able to buffer completely up to it's rxq size some of these same buffers need to be returned so the nic is not starved.

Well behaved apps

Order	DUT1	DUT2
1	`efsink enp1s0f1np3 udp:239.1.2.3:22000 &`
2	`efsink enp1s0f1np3 udp:239.1.2.3:22000`
3		`efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024`

All packets received for both efsinks

Sleepy app

Order	DUT1	DUT2
1	`EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000` -> `Ctrl z`
2		`efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024`
3	`fg`

Dut1 output:

$ fg
EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000
         0                0                0
     41470              353            41470
     22337              190            63807

Sleepy app - small rxq

Order	DUT1	DUT2
1	`EF_VI_RXQ_SIZE=512 efsink enp1s0f1np3 udp:239.1.2.3:22000` -> `Ctrl z`
2		`efsend enp1s0f1np3 239.1.2.3 22000 -n 200000 -m 1024`
3	`fg`

Dut1 output:

$ fg
EF_VI_RXQ_SIZE=512 efsink enp1s0f1np3 udp:239.1.2.3:22000
         0                0               16
      1017                8             1033

Consume, fall behind and awake

Order	DUT1	DUT2
1	`EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000`
2		`efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024`
3	`Ctrl-z`
4	`EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000`
5		`efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024`
6	`Ctrl-c` , `fg`

The efsink that was put to sleep received 1,000,000 packets before it was put to sleep. When it was awoken at time 6 It caught up on some of the buffers it missed leaving it with a total of 1061503 packets.

Large catchup

Order	DUT1	DUT2
1	Create 15 efsinks with: `EF_VI_RXQ_SIZE=65536 efsink enp1s0f1np3 udp:239.1.2.3:22000` and put them in the background with `Ctrl-Z`
2		`efsend enp1s0f1np3 239.1.2.3 22000 -n 1000000 -m 1024`
3	`for i in {1..15}; do bg; done`

Each application catches up by 943,680 packets because a large number of superbufs were donated to the nic. The rate of catchup is slow at: 279Mb/s due to napi_poll not being triggered frequently enough as there are no incoming packets on the nic. If there are incoming packets during catchup, the buffered superbufs will be posted much faster due to a more frequent napi_poll. If the same as above is done but packets are continuously sent when the applications wake up, the catchup is much faster and a bandwidth of ~17.5Gb/s is observed. This is on a 10 Gigabit nic so the ~7.5Gb/s bandwidth increase is due to the buffered superbufs.

ligallag-amd · 2024-01-09T10:54:59Z

I have pushed a new commit to this branch which should allow users to reload the onload kernel module without having to recompile their user space program. As such, there will be differing views of CI_EFCT_MAX_SUPERBUFS from user and kernel space due to this patch. This case is now handled in the kernel module without returning an error.

An alternative fix to this patch is to return CI_EFCT_MAX_SUPERBUFS to 512 and deal with the consequences of that. This is done on this branch if this is preferable for whatever reason. The obvious problem with the fix on this branch is that the user won't be able to use more than 512 sbufs.

src/lib/efrm/efrm_efct_rxq.c

Superbufs are relinquished by userspace by posting them through a shared memory ringbuffer. When the ringbuffer is full, userspace now stores the unfreed superbuf ids in a linked list until there is space in the ringbuf.

Superbuffers previously were only able to be posted to apps if they lied between efhw_nic_efct_rxq->sbufs's added and removed pointers. These were typically only 4 apart meaning an app couldn't fall further than that without packet drops. Superbufs are now kept around longer and only removed once each app has received it or the nic is about to run out of donated superbufs. In the latter case, the sequence number of the apps will be advanced. When an application is destroyed, the superbufs it donated will be taken away and so another sleeping application will need to be skipped ahead. In the case where no buffers can be freed (all received buffers have been posted to the shared memory - particularly true for small rxqs) and a new one arrives through efct_buffer_start, ensure that on efct_buffer_end we free that same donated buffer so the nic doesn't starve.

Users should be able to reload the onload kernel module without having to recompile their user space program. As such, there will be differing views of `CI_EFCT_MAX_SUPERBUFS` from user and kernel space due to this patch. This case is now handled in the kernel module without returning an error.

rhughes-xilinx

I nearly approved of this. You might get away with fix-and-ship

src/driver/linux_resource/efct_superbuf.c

src/lib/efhw/efct.c

src/include/ci/internal/seq.h

src/lib/efrm/efrm_efct_rxq.c

ligallag-amd · 2024-02-15T10:12:35Z

Commits have been merged into master so closing this PR:
See the following commits:
0c4ee89
7aae294
66cdbd6
38130d2
cb09fd9

ligallag-amd requested a review from a team as a code owner December 13, 2023 11:38

ligallag-amd force-pushed the efct_superbufs branch from 18fc781 to d761fc8 Compare December 18, 2023 13:43

ligallag-amd changed the title ~~X3: Catchup applications on missed superbufs upon wakeup when they have slept~~ X3: Catchup applications on missed superbufs upon wakeup Dec 18, 2023

ligallag-amd force-pushed the efct_superbufs branch from d761fc8 to d2d2bb9 Compare December 18, 2023 16:28

ligallag-amd force-pushed the efct_superbufs branch 2 times, most recently from 02e7a43 to da804f7 Compare January 9, 2024 10:45

rhughes-xilinx reviewed Jan 9, 2024

View reviewed changes

src/lib/efrm/efrm_efct_rxq.c Outdated Show resolved Hide resolved

src/lib/efrm/efrm_efct_rxq.c Show resolved Hide resolved

ligallag-amd force-pushed the efct_superbufs branch 2 times, most recently from 2ec2eb6 to 01e3564 Compare January 9, 2024 16:57

ligallag-amd and others added 5 commits February 8, 2024 10:13

ON-15198: Hold on to superbuffers in efct descriptors.

1a24a79

Superbufs are relinquished by userspace by posting them through a shared memory ringbuffer. When the ringbuffer is full, userspace now stores the unfreed superbuf ids in a linked list until there is space in the ringbuf.

ON-15198: avoid requesting too many hugepages when binding to an RXQ

d669941

ON-15198: increase the maximum allowed superbufs from 512 to 2048

75f6331

ligallag-amd force-pushed the efct_superbufs branch from 01e3564 to 30e45ad Compare February 8, 2024 10:46

rhughes-xilinx reviewed Feb 8, 2024

View reviewed changes

ligallag-amd closed this Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

X3: Catchup applications on missed superbufs upon wakeup #195

X3: Catchup applications on missed superbufs upon wakeup #195

ligallag-amd commented Dec 13, 2023 •

edited

Loading

ligallag-amd commented Jan 9, 2024

rhughes-xilinx left a comment

ligallag-amd commented Feb 15, 2024 •

edited

Loading

X3: Catchup applications on missed superbufs upon wakeup #195

X3: Catchup applications on missed superbufs upon wakeup #195

Conversation

ligallag-amd commented Dec 13, 2023 • edited Loading

The problem

Proposed solution

Posting from userspace to kernelspace

Posting from kernelspace to userspace

Testing

Well behaved apps

Sleepy app

Sleepy app - small rxq

Consume, fall behind and awake

Large catchup

ligallag-amd commented Jan 9, 2024

rhughes-xilinx left a comment

Choose a reason for hiding this comment

ligallag-amd commented Feb 15, 2024 • edited Loading

ligallag-amd commented Dec 13, 2023 •

edited

Loading

ligallag-amd commented Feb 15, 2024 •

edited

Loading