-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USB failing after several hours - xhci_hcd 0000:01:00.0: Ring expansion failed #5088
Comments
This has been cross-posted to the forum here: https://forums.raspberrypi.com/viewtopic.php?t=336947 Some condition is causing the xhci driver to think that it needs to expand a transfer ring, which it should almost never need to do. This has been seen before (and fixed) in #3919.
Which kernel version is running when the out-of-memory errors occur? Also, please post the results of the commands in #3919 (comment) . |
Thanks for coming back to me on this, I very much appreciate it. I've currently been running stable for almost 4 days after switching to the 64bit version of the kernel. I'm hoping that this has resolved the issue rather than just delayed it. When I find a moment I'll try and set up an additional device for testing the original problem and following the steps posted in the above link, certainly an interesting read and I apologise for not being able to find it myself! I've removed the gpu_mem setting and my applications are running as expected on my test unit, so that's great thanks. Must have been a relic from the past! Linux rpi 5.15.32-v7l+ is the version I was having problems with, I had the same issues with a 5.10 version as well. The 4.19.97 had been working fine for years running the same application. |
As an update, I've run the suggested trbs command on the 64bit currently running build and here's the result: 512 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep00/trbs Looking at this in comparison to the linked thread, the 65536 seems like its bad and has expanded from the 512 that it should be. At this point I'm assuming I'll be expecting the error to reappear in the next day or 2. Following in the footsteps of the previous solution:
If you need any further information, I'm happy to provide. |
Do you have a minimal python script that will run on a basic Pi OS install that stimulates the ring expansion? |
So after a hefty bit of playing around, it looks as though for the trbs to expand as expected I need the modbus device I'm using actually connected to the usb serial otherwise I can't seem to replicate the issue. I think if I'm going to stand a chance of getting this fixed I'll have to send in the modbus device that i'm using and the usb converter in your direction, is that at all possible? Haven't been able to test on a basic pi OS just yet, but my build has very minimal surface level changes. It may be possible to to use the below script to replicate this if you have a modbus device that you can talk to via rs485, unfortunately I don't have enough hardware available to test alternatives at the moment. This script seems to do the trick to expand the buffer though, just requires:
Should blow through the 512 mark quite rapidly, will loop through 100 reads then check the trbs for a change in size then will print out the size and time it was detected.
For the mean time as a temporary fix I've thrown together script that's just checking the trbs size and if it exceeds X size then use usbutils to reset the device which also resets the trbs. |
Modbus RTU is just a framed serial protocol. I doubt the actual messages matter, just the fact that bytes are being received by the UART. If you have two rs-485 adapters, can you generate messages on a different Pi and cause the ring expansion on the board running the test script? |
We have also bumped into this problem using the Modbus protocol and an FTDI serial adapter. Other than the proposed USB reset, which we are experimenting with right now (hopefully it will work!), does anyone know of any workarounds or adjustments to kernel/driver parameters that might help? We have several application-based ideas (such as reducing message frequency), but I'd be interested in knowing if anyone has any other ideas. |
It must be something specific in how the FTDI driver is manipulating its URBs. RS232-mode FTDI serial converters don't have this issue, or at least nobody has reported that they do, and these are far more widely used than RS485 converters. |
I wonder if extending the fix from df30883 to support this device does anything. If I get some time I might try it out, though I doubt it works. |
The patch is entirely driver-agnostic (and mentioned earlier in the thread). This doesn't discount the possibility that there might be some other ring condition that triggers expansion that the patch doesn't cater for. |
I saw it mentioned earlier in the thread. However, I thought that because of this:
it might be specific for that device. But if you say it is driver agnostic, I believe you. |
We are facing this issue with an 8-port RS232 FTDI serial converter (StarTech ICUSB2328I).
Example dmesg error log after arbitrary uptime:
All ports seem to fail simultaneously. I am happy to provide any further information as needed. |
How are the ports being used? Do you get the out-of-memory condition if a single port is used? What userspace program is accessing the ports? |
A Python app spawns a thread per port with a loop structured like this to check if a serial device is connected: import serial
import time
while True:
try:
with serial.Serial(interface, baudrate, timeout=1) as ser:
ser.write("PING")
res = ser.readline()
except:
pass
time.sleep(2) I get the OOM condition even if no device is ever connected. The ports just stop working after an arbitrary amount of time. Everything works fine until then, and devices can connect, communicate and disconnect at will. I have not yet tried with only one thread for a single port. |
The python fragment appears to be incomplete. Can you post a self-contained testcase that you know will stimulate the bug?
|
Of course. import serial
import time
import os
while True:
try:
with serial.Serial('/dev/ttyUSB0', 38400, timeout=1) as dev:
dev.write('PING\n'.encode('utf-8'))
response = dev.readline()
except (serial.SerialException) as e:
if str(e).find('Cannot allocate memory'):
print('OOM error has occured. Exiting')
os._exit(1)
pass
time.sleep(0.6) The error you got from your fragment looks like a naming collision between module "serial" and your filename "serial.py". But to be sure, this is my environment:
(after 12-18 hours). Like @Jorl17 this does lots of opens and closes.
I have no expertise regarding what might cause this error in either of the USB drivers (or hardware), but stumbled upon this and couldn't tell from the current implementation if this reset is present. |
For reference, this is also our scenario. We have a single serial port (bar the ones we have for an LTE modem) where we are running commands very frequently, and, as they are run, the trbs size just keeps on increasing. Lowering the frequency of updates has a very clear impact. We changed it from 0.2s to 1s and the trbs is not growing so fast -- though it is certainly growing. One additional situation we have is that our serial port is being opened and closed very fast, instead of being kept open all the time. Every time we send or receive data, we're opening it and closing it. |
|
No. The device context needs to be destroyed and reallocated for the leak to be reset. I can replicate the leak with the script posted above (note: github doesn't ping email notifications on comment edits). The first expansion happens in about 2-3 minutes, the second after 10+, so it's either random or the requisite interval doubles each time. |
The bug is effectively random, but only occurs immediately after a Link TRB is traversed. I get this odd condition in room_on_ring(): Edit: the link TRB appears to be coincidental. There's a steady leak of num_free_trbs each time we stop and restart the bulk IN endpoint, which eventually breaks the condition check. |
Spent several hours looking at trace logs and got mightily confused until I realised there's two "bugs" happening simultaneously. usb-serial submits multiple URBs for the IN endpoint and this is the primary cause of the leak. In addition to this, there's a race between URB dequeue and resubmission - while the port is being closed, there's nothing stopping the callback from resubmitting the next URB in sequence, but this doesn't result in a leak when the next URB gets dequeued from the transfer ring. When the port is shut down, the first dequeue request must stop the endpoint ring because the IN urb has been claimed by the xhci hardware. This process happens asynchronously, so other URBs can sneak in afterwards (and ring the endpoint doorbell, undoing the stop-endpoint command). Eventually after 1-2 Stop Endpoint commands the state ends up consistent with no more reubmissions. Somewhere in the first call to Stop Endpoint and Set TR Dequeue Pointer is where the leak happens. It doesn't take into account the fact that if the dequeue pointer is moved past all of the TDs on the ring (on to the first TRB with a cycle state bit != the consumer cycle state), it needs to return those TRBs in the process. If I change the number of URBs here: https://github.com/raspberrypi/linux/blob/rpi-5.15.y/include/linux/usb/serial.h#L92 Oh and the bug isn't specific to the VL805. The dwc3 xhci controller on the Pi's USB-C port is affected as well. |
One obvious bug was in xhci_move_dequeue_past_td() where the ring state was advanced but not accounted for. With this fixed, I still get occasional drops across device open/close which makes me think there's a genuine race condition happening. In particular, I'm suspicious of td->cancel_status which attempts to keep track of whether the xHC has read (and cached) the TD. It has several states, touched in many places in the driver. |
I think I've found the cause - there's a large number of gotchas when a URB is removed that is ahead of the ring's current dequeue pointer, i.e. the xHC hasn't yet processed the TD associated with it. This frequently happens if you have a mostly-idle endpoint such as a serial port. Removing an un-processed TD invalidates it by turning it into a no-op instead of moving the dequeue pointer (which was the case above), but doesn't account for the TRBs that got binned. Strictly speaking, the space isn't "free" because if you subsequently moved the dequeue pointer to the end of the current cycle, or added more TDs before restarting the ring, the free count vs the actual pointer position would diverge. |
@Jorl17 @mhartoft can you try compiling from this branch: https://github.com/P33M/linux/tree/trb_leak_fix and testing with your hardware. Here, on either the vl805 or dwc3 controllers on a Pi 4, I no longer get stray num_trbs_free counts across 1, 2, 8 or 56 usb-serial bulk URBs pending. |
Do you know if I can get away with just compiling some specific modules so as to not fully recompile the kernel? From an operational perspective it will be difficult |
xhci_hcd is built-in, so /boot/kernelN.img has to be updated, and the machine has to be rebooted. It goes without saying that you shouldn't test this on a board you care about accessing remotely. |
Yes, I am on it |
There is no leak detectable in shorter timeframes where it would definitely show without the patch. Will leave it running and continually monitor over the weekend. The ports are functional, and serial communication is flowing 🎉 |
@P33M Thank you very much for your hard work! From what you said it seems like this affected all kinds of serial ports, right? While we noticed this with the FTDI adapter, we also see it (albeit much slower) with a serial associated to an LTE modem. Thanks again! |
The bug is exercised by a driver indirectly manipulating endpoint ring state, and is not limited to a particular class of driver. For example one that frequently issues queued commands over a bulk or control pipe that end in stall responses would also hit the bug. |
I'm on bad hardware to test Yocto (need to set up a VM I suspect) but recursively cloning trunk resolves the kernel Thought I'd also share this related-looking thread: https://bugzilla.kernel.org/show_bug.cgi?id=217242 |
Here are the most relevant portions of that bug:
--- a/drivers/usb/host/xhci-ring.c
+++ a/drivers/usb/host/xhci-ring.c
@@ -2214,6 +2214,7 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
u32 trb_comp_code)
{
struct xhci_ep_ctx *ep_ctx;
+ int trbs_freed;
ep_ctx = xhci_get_ep_ctx(xhci, ep->vdev->out_ctx, ep->ep_index);
@@ -2283,9 +2284,12 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
}
/* Update ring dequeue pointer */
+ trbs_freed = xhci_num_trbs_to(ep_ring->deq_seg, ep_ring->dequeue,
+ td->last_trb_seg, td->last_trb,
+ ep_ring->num_segs);
+ ep_ring->num_trbs_free += trbs_freed;
ep_ring->dequeue = td->last_trb;
ep_ring->deq_seg = td->last_trb_seg;
- ep_ring->num_trbs_free += td->num_trbs - 1;
inc_deq(xhci, ep_ring);
return xhci_td_cleanup(xhci, td, ep_ring, td->status); |
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
Having found this upstream discussion, which we believe is a reproduction of our issue, we're interested to see it was patched recently here. It was backported to 5.15.117, but now we're stuck until raspberrypi-linux integrates it so we can use in a downstream OS. What's the usual pattern/timeline for integrating the latest changes from upstream into 5.15.y? |
I have the same problem, LTE ttyUSB serial port trb leaks, LTE module --> vl805 --> CM4. |
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
We aren't expecting to build any kernels from rpi-5.15.y, so it is no longer being updated as a matter of course. A quick check has shown there are several merge conflicts, so it's more than just merge and push. Have you considered moving to 6.1? |
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See raspberrypi/linux#5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. Link: raspberrypi/linux#5088 Change-Id: I858a32e6bcbb525cccff3a6d07fe77d2be67f5e7 Signed-off-by: William Wu <[email protected]> Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See #5088 Signed-off-by: Jonathan Bell <[email protected]>
If a ring has a number of TDs enqueued past the dequeue pointer, and the URBs corresponding to these TDs are dequeued, then num_trbs_free isn't updated to show that these TDs have been converted to no-ops and effectively "freed". This means that num_trbs_free creeps downwards until the count is exhausted, which then triggers xhci_ring_expansion() and effectively leaks memory by infinitely growing the transfer ring. This is commonly encounted through the use of a usb-serial port where the port is repeatedly opened, read, then closed. Move the num_trbs_free crediting out of the Set TR Dequeue Pointer handling and into xhci_invalidate_cancelled_tds(). There is a potential for overestimating the actual space on the ring if the ring is nearly full and TDs are arbitrarily enqueued by a device driver while it is dequeueing them, but dequeues are usually batched during device close/shutdown or endpoint error recovery. See raspberrypi/linux#5088 Signed-off-by: Jonathan Bell <[email protected]>
Describe the bug
Hi, I was hoping someone here may be able to help me.
I've been using a disk image (Buster) that was working fine with RPI4 R1.4
However since the RPI4 R1.5's have started to filter through I've had to update the system to get the disk image to start up.
This seems to have caused an issue with my USB FTDI serial adapter dropping off between maybe 8 - 14 hours and requiring either a hardware reboot or an unplug and re-plug of the serial adapter.
A dmesg while the USB has stopped shows that the system is being flooded by clock change errors ie:
From what I've read it seems like the -12 relates to a memory issue, however I'm running 16GB disk image with 53% free and the memory usage is around 444mb/3.33G. So I can't see that being the issue
Here's a few things I've attempted to get it working:
Tried on Kernel 5.10, 5.15 both produce the same issue.
Set the coherent_pool=4M in config file to see if that helps by mirroring the older firmware I used - Still happens in same timeframe.
Updated Bootloader.
Changed PI to another Rev1.5 in case it was a board issue
Attempted on an older pi Rev1.4 to see if it was newer board issue, with my updated disk image the same issue occurs.
Tried to replace the start4.elf and the other boot files from the later disk image to my older disk image, not including kernel, I might not have done this one correctly but I'm assuming they don't play nicely together being mismatched, (it broke things).
Rebuilt on new OS (Bullseye) with latest updates/upgrades, last around 8 hours longer before it drops off again.
This is the closest I could find of a similar issue, which doesn't seem to come to any real conclusion:
#3479
Oddly enough trying to run a test script with some neopixel LEDs I get the following message:
raise RuntimeError('ws2811_init failed with code {0} ({1})'.format(resp, str_resp))
RuntimeError: ws2811_init failed with code -2 (Out of memory)
In addition to this I've discovered that if I run the following:
As an attempt to get the USB host to reset, seemed to clear what every memory issues was previously there and allowed me to start the pixel test script and also stopped the error spam with the changing of clock frequency, this however did not redetect the FTDI serial adapter that was connected and it dropped off completely.
My latest attempt at resolving this is now using the 64bit Kernel and I will report back on how this goes.
Would appreciate any help or things that I could try to get this resolved or at the very least narrow down why it's happening in the first place.
Many Thanks in advance.
Steps to reproduce the behaviour
Unknown. I plug in USB 232 serial adapter and read from via python script, wait 4-48 hours.
USB will stop reading and dmesg gets spammed with errors.
Device (s)
Raspberry Pi 4 Mod. B
System
Previous Working Kernel:
Latest with Issue:
Logs
Additional context
Edit: Formatting
The text was updated successfully, but these errors were encountered: