-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very rare but possible glitching on PRU signal generation that can cause unexpected flashes #49
Comments
Digging in further on this, I think the ultimate source of the jitter is the fact that the PRU code accesses the GPIO pins though the ARM address space rather than directly via r30. This can cause stalls when there is contention, and I think these stalls are the root problem. With this in mind, I think the best solution might be to rewrite the PRU stuff to go direct to the pins and get absolute deterministic timing. This would not be as much work as it might seem now that the PRU C compiler is getting mature, but I am hesitant to do it if I am really the only one who had even been effected by this issue.... |
Hey, thanks for looking into this, and I'm sorry I didn't get back to you sooner. I am aware of the issue, and have done several things to mitigate it on various branches. The simplest one, that I think is on master, is simply to check how many cycles have passed since the zero write started, and if it's too long, we abort the entire frame. This has the effect of only showing one white pixel rather than corrupting the rest of the frame. Secondly, you can rewrite the PRU code to not go back to DRAM for every bit, but rather load data in entire RGB chunks into the registers and write that. I have an experimental branch, spi-cape-support, that supports this along with several other improvements. The main change is that a custom PRU program is built at runtime for the specific number of LEDs and driver type. The problem with using r30 is that you're quite limited in the number of channels you can output. Something like 12. That's far too few for my use cases, but I can certainly see the value for some projects. Combining that with loading all data into the registers should be pretty foolproof, though you still might have to drop a partial frame if it takes too long to get the next pixel of data. Using PRU RAM might fix this, though I had some issues with it when I tried. I'd be happy to talk with you via phone about your investigation and my ideas and work. Feel free to email me at [email protected] if you're interested. I have some time tonight. |
Your traces look very similar to mine (except you have a nicer scope :) ).
The fact that they are quantized to 10ns steps definitely suggest some wait states getting thrown in during T0H.
I am like 75% sure that this is happening due to contention on the L3/L4 interconnects when the PRU is accessing the memory for the GPIO pins though the ARM address space. If so, then I see a few solutions:
1) Make a new PRU driver that talks directly to the pins though R30. This is straightforward and guaranteed to work, but has the downside that we would be limited to only 24 pins (and therefore 24 strings) because that is the maximum number of PRU pins that are available on the BBB headers. For many applications (including mine) the limited number of pins would be ok. With this solution it would also be pretty easy to do some cool stuff like moving the temporal dithering into the PRU code which could save some load on the main Linux processor and possibly improve frame rates slightly.
2) Dig in deep on the OMAP L3/L4 interconnects and try to figure out a way to make our PRU to GPIO accesses more deterministic. This would let us continue to use all available GPIO pins, but has the downside that I don’t know anything about this stuff so would have to try and learn it. There looks like there are registers that control this stuff, but after much googling I cannot find any good documentation on how it all works.
3) Switch to a DMA-based signal generation scheme instead of the PRU. This again would let us continue to use all available GPIO pins and have low load on the main CPU, but has the downside that (a) the DMA channels might end up suffering from the same non-determinism as the current scheme , and (b) kinda changes what the whole LEDscape project is about.
I’ll probably attack the solutions in the order listed above when I have time in the next month or so, but could be convinced to start with #2 if anyone has any pointers to better info in the interconnects that could give me some encouragement. There must be some people somewhere who understand this stuff, but I don’t know how to find them.
…-josh
From: Mark Renouf [mailto:[email protected]]
Sent: Friday, December 23, 2016 8:02 PM
To: Yona-Appletree/LEDscape <[email protected]>
Cc: Josh Levine <[email protected]>; Author <[email protected]>
Subject: Re: [Yona-Appletree/LEDscape] Very rare but possible glitching on PRU signal generation that can cause unexpected flashes (#49)
To add to what bigjosh reported, here's a clear picture of the issue. It's not uncommon at all, it's very easy to see the glitch by running 'black' demo with a length of 1, so each pin outputs a single 24bit sequence, using a pulse trigger to sync on the frame start.
This is a capture of the trailing edge of the first pulse, with persistence set to infinity, which clearly shows a jitter of between 10 and 80ns. It seems fairly random, but across a long string the additive error can become quite large.
It sounds like you have a cause and some ideas of how to fix it... far beyond my capabilities right now but I wanted to chime in and let you know it's very common -- though I'm not sure I saw actual gitching when testing with a 5m string of WS2812B's (maybe mine are better spec'd?).
<https://cloud.githubusercontent.com/assets/52987/21464361/e5e75f1a-c948-11e6-8bb2-dbf153ace887.png>
Zoom on trailing edge of previous, with timing cursors:
<https://cloud.githubusercontent.com/assets/52987/21464362/ecaed7b0-c948-11e6-9083-c63417438b88.png>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#49 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AFQ7magb4Ri4uTe6-kYEzh71Paq_6uqWks5rLG8XgaJpZM4KwmrX> . <https://github.com/notifications/beacon/AFQ7mQxaKAm7oYy8E0rNS-QLN5biLtn2ks5rLG8XgaJpZM4KwmrX.gif>
|
I am pretty sure this is the root of the problem...
http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Latencies I think it will take some deep digging in the bowels of the on-chip LAN to reduce these jitters. |
It's worth noting, however, that it's the writes that directly affect the GPIO jitter:
The problem with this is that we can't even tell how long it took for the write to get to the GPIO register, so we can't account for the jitter. With the version of the code on master, we are reading all the data for every bit, which could cause issues due to long reads (and there are checks to abort the strip write in this case). I suspect the only real solution is to use r30. I have a prototype of an r30-based ws281x driver working on the |
Ah, yes. Humbly corrected- the jitter on the write side is totally different (and invisible!), but I still think ultimately depends on the interconnect fabric priorities. R30 a great solution for me since I never need that many pins. Any feel on how important more pins are in general? |
Good question! I honestly don't have a good idea of who is using LEDscape right now (other than you!), and what their needs are :) That would be nice info to have, though. |
Personally, I use LEDScape to drive 23 separate strips from the BBB, have disabled HDMI, and don’t use any other GPIO lines. |
Well, if you could bring that down to 22, you could use all r30 pins. Technically you can also disable the eMMC, but that's a little harder to deal with. |
Hi Yona! Big fan of your work! I went from one LED fun several months ago to custom produced rigid board matrixes for displaying lots of dynamic data on industrial machines. I went through all the initial things of playing with arduino simple stuff not knowing how to even solder and now playing with oscilloscopes:) Big Josh and the great work of his brought me to your fork here actually:) This flicker issue is the last piece in my puzzle. And that finally brought me here to this thread. My LED type is SK6812 which have a slighty different timing namely the 1’s in question are shorter therefore I believe this flicker problem is a lot more pronounced for my case. I don’t use any cape or level shifter, I brought the voltage down to 4.3 (saving power and reducing brightness which is a plus in my case) and the signals register just fine altogether. I will take a look at the prototype driver in the branch you mentioned tonight. Would love to contribute further somehow in case you wanted to merge that into master. Or if there are any news regarding this, I would be glad if you let us know. Would shortening of the time help here (according to the SK6812 specs and their timing threaholds)? I tried to look at the templates but the machine code is too low level and therefore at this stage below my comprehension:) Cheers |
Thanks for kind words, and I’m glad things are (mostly) working for you.
How many strips are you driving? If you can get by with 22 outputs, you can easily use my rewrite to use the direct PRU GPIO access. This helps substantially with the flicker.
At this point, the easiest thing to do is just give you one of my pre-built linux images that has everything set up correctly.
~ Yona
… On Mar 5, 2018, at 13:11, orangemelon69 ***@***.***> wrote:
Hi Yona!
Big fan of your work!
I went from one LED fun several months ago to custom produced rigid board matrixes for displaying lots of dynamic data on industrial machines. I went through all the initial things of playing with arduino simple stuff not knowing how to even solder and now playing with oscilloscopes:) Big Josh and the great work of his brought me to your fork here actually:)
This flicker issue is the last piece in my puzzle. And that finally brought me here to this thread.
My LED type is SK6812 which have a slighty different timing namely the 1’s in question are shorter therefore I believe this flicker problem is a lot more pronounced for my case.
I don’t use any cape or level shifter, I brought the voltage down to 4.3 (saving power and reducing brightness which is a plus in my case) and the signals register just fine altogether.
I will take a look at the prototype driver in the branch you mentioned tonight. Would love to contribute further somehow in case you wanted to merge that into master. Or if there are any news regarding this, I would be glad if you let us know.
Would shortening of the time help here (according to the SK6812 specs and their timing threaholds)? I tried to look at the templates but the machine code is too low level and therefore at this stage below my comprehension:)
Cheers
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDpokTNIkpYj4vE21P2a0B1ZPPy7uN6ks5tbanogaJpZM4KwmrX>.
|
Speak about lightning fast replies:) I am driving 7 strips (up to cca 600px each but normally around 300-500) 7 outs is the max I will need for this use. If you’d be so kind that would be just great! Cheers |
Sorry this one isn’t so fast! Send me an email to [email protected] so we can arrange that.
… On Mar 5, 2018, at 13:19, orangemelon69 ***@***.***> wrote:
Speak about lightning fast replies:)
I am driving 7 strips (up to cca 600px each but normally around 300-500)
7 outs is the max I will need for this use.
If you’d be so kind that would be just great!
Cheers
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDpov0Tln1N6gXAeiN2Q5mwJsWjfD2wks5tbavSgaJpZM4KwmrX>.
|
While the vast, vast majority of 0 bits coming out of the PRU are 300ns-370ns wide, I am seeing a very rare case where a 0 bit can be as wide as 540ns, which is wide enough to be seen as a 1 by some WS2812B chips.
When the problem happens, it seems to stretch all output bits being transmitted at that moment, although there is only material impact on 0 bits since 1 bits just become slightly longer 1 bits.
Outwardly, this appears as a row of pixels flashing for a single frame. It is especially noticeable when running strings in demo mode "black" when all bits should be 0. It is possible this is only visible on WS2812B chips with a shorter-than-spec
T1H
minimum time.I verified the problem by attaching a scope to an output and setting to trigger on minimum pulse width of 450ns. Then I ran the "black" demo mode. In this mode, all bits should be 0 so I should never see a pulse wider than 450ns. Yet I was (rarely) able to capture pulses as wide as 540ns.
The stretched bits seem to happen more frequently when the ARM is under heavy memory stress so I think this might be caused by a worst-case series of cache misses when the PRU accesses the data in ARM RAM.
The current approach of timing the bit phases uses the cycle counter. Is it possible that the cycle counter does not not count cycle where the PRU is stalled because it is waiting for a cache miss when reading external RAM? The
STALL COUNT
register possibly indicates this...Possible solutions might include...
T0H
phase of the bits. This would still add jitter to the time between bits when cache misses occur, but as long as this time is less thanRESET
, then the only impact should be (very) slightly diminished performance rather than bad data.I can try to tackle either of these approaches, but just want a sanity check before doing the work. Has anyone else ever seen these wide bits (or the flashes they produce)?
The text was updated successfully, but these errors were encountered: