-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved framerate control code - strip.show(), strip.service() #4244
Conversation
* separate fps calculation (strip.show) from framerate control (strio.service) * improved condition for early exit in strip.show * make MIN_SHOW_DELAY depend on target fps * strip.show consideres complete time for effect calculation + show; old code wrongly used the time between completion of last show and start of next effect drawing, causing unexpected slowdown * add "unlimited FPS mode" for testing * increase warning limits for "slow strip" and "slow effects"
The below info is from a 32x32 matrix on 4 output with target FPS set to 120, I only tested 2d effects and I used ws but also verified some from the info page (give a small margin for errors due to ws which I kept unedited ) . Visually all effects seems faster with the new code ( too fast on smaller 32x16) . Some of the effects are very hard to notice a difference and might be more apparent on larger matrix then 32x32 but for sure it is better now . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not intend to review but changed my mind as I think there are flaws that need addressing.
I do not know the reasons for MIN_SHOW_DELAY
so that must be explored separately.
wled00/FX_fcn.cpp
Outdated
@@ -1349,7 +1361,8 @@ void WS2812FX::service() { | |||
// overwritten by later effect. To enable seamless blending for every effect, additional LED buffer | |||
// would need to be allocated for each effect and then blended together for each pixel. | |||
[[maybe_unused]] uint8_t tmpMode = seg.currentMode(); // this will return old mode while in transition | |||
delay = (*_mode[seg.mode])(); // run new/current mode | |||
frameDelay = (*_mode[seg.mode])(); // run new/current mode | |||
if (frameDelay < speedLimit) frameDelay = FRAMETIME; // limit effects that want to go faster than target FPS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should involve _frametime
not speedLimit.
_frametime
reflects desired FPS (rounded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speed limit is actually 85% of _frametime. I've added it to prevent that effects that return a custom FRAMETIME exceed user defined target FPS. 85% to have some margin for "acceptable" overspeed that can be handled.
@blazoncek thanks for your review 👍 The whole logic was quite hard to "crack", so happy to get your thoughts 😃 |
Overall comment is that I think there's a number of otherwise independent changes all glommed together here, and it does make it a little tough to review all at once.
I'm not super thrilled with the conditional logic for "fast enough" chips -- it feels to me like there's a generally safe algorithm somewhere in here, struggling to get out. In some ways, high FPS is perfectly fine for fast FX even on slow chips, since they'll be servicing the wifi stack in between every one of those fast frames. The tougher challenge is delivering stable FPS, since servicing the wifi can itself be slow. I think the trick is to balance time spent computing and publishing frames with slack time left for other tasks, so that when one of those tasks wakes up, we don't have to see an FPS glitch. I think a good algorithm for that estimation will be applicable in all environments. Finally, maybe we want to go past |
Is more precision required? 1/10th of a frame at 100FPS seems plenty accurate. or which part are you referring to that needs better precision? |
* keep FRAMETIME_FIXED as a fixed value * remove WLED_FPS_SLOW and FRAMETIME_FIXED_SLOW * explicit test "(_targetFps != FPS_UNLIMITED)" for debug messages * don't modify _lastServiceShow in show() * test for "fps == FPS_UNLIMITED" explicitly, so we could pick a different magic number later
@willmmiles some updates
this should be gone now. I've pushed a separate commit into 0_15.
Maybe I should have explained better - |
I'm not sure either. In fact as we approach the higher framerates, the "air gets thin" with milliseconds time
|
True! While the metrics fix could be done as its own commit, having those accurate metrics is a prerequisite for scheduling improvements. |
@willmmiles @DedeHai The fps calculation in strip.show() could be changed to use The effect timing code would stay as-is. For measuring FPS, rollover would not be a problem because we can simply skip measurements where last_timestamp > current_timestamp. The MoonModules fork is using What do you think? |
Personally I strongly prefer solutions that work on all supported platforms, hence the suggestion of |
I like Will's suggestion, if only relative comprisons are needed, than using getCycleCount() is the better solution. It's very easy to write a just FYI regarding performance measurements and FPS: I have modified code I use to increase the low-pass filter time constant of |
Yes @DedeHai @willmmiles what you say makes sense. I can try work out a PR during the week. @DedeHai I'd love to see how you modified the FPS low-pass filtering. Actually the current FPS filter makes measurements very inaccurate, so any improvement is welcome 😃 |
@willmmiles you might be right - I'm also not super happy with my implementation, however it solves some problems with the old code:
The effect scheduling current works in a synchronous way, where all effects are updated at the same time (guaranteed by FRAMETIME). So all effects are "marching in lockstep". However effects that request a different frametime may be lucky (if requested time is a multiple of the scheduler timestep) or unlucky if they are completely off sync. Luckily almost all effects have "return FRAMETIME" so they fit into the schedule. If we want to faithfully schedule effects with different timing needs, it could not be done in an "all effects draw if needed, then strip.show updates all LEDs" way. With different frametimes per effect, you would let some effects draw (if their time has passed), while skipping the ones which are not due yet - the current algo does this, too. But: the next scheduling cycle must be dynamic. Its the minimum of all frame delays, i.e. min(segment.next_time - strip.now). This means additional draw cycles in the main loop, plus strip.show() would be called much more frequently. Maybe there is a better way, I'm open for any ideas and suggestions. |
* _frametime ensures that effects are not serviced too often * MIN_SHOW_DELAY is the minimum allowed FRAMETIME that can be requested by effects
I've simplified the scheduler logic, and removed all estimates for "lower limits". The logic is much cleaner now
The logic is now very much in-line with the original function from WS2812FX side note Line 811 in 70323b9
imho the effect logic of "andriod" needs optimization - actually variable framerates are not needed, if the main effect code is put into a loop that handles several pixels (number depending on speed setting and SEGLEN). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know what are the reasons for running (certain) effects faster than 120 FPS to grant the change in MIN_SHOW_DELAY or additional logic to prematurely exit strip.service()
call.
If this is for purely developer's assistance then perhaps adding conditional code might be better option than "unlimited FPS" (which is false as it will still be limited to 333 FPS in best case).
IMO fixing MIN_SHOW_DELAY to 8 for ESP32 and 16 for ESP8266 would be a good option.
@@ -1309,7 +1309,14 @@ void WS2812FX::finalizeInit() { | |||
void WS2812FX::service() { | |||
unsigned long nowUp = millis(); // Be aware, millis() rolls over every 49 days | |||
now = nowUp + timebase; | |||
if (nowUp - _lastShow < MIN_SHOW_DELAY || _suspend) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if removing this line was a good move. MIN_SHOW_DELAY
is equal to 3 which is 1 ms more than what's used currently on ESP32 (see line 1315 below).
The more I look into this the more I have the feeling that MIN_SHOW_DELAY
should be respected at all cost. But the value of it is debatable (though I think that a values of 8 AKA 125 FPS or 16 AKA 62.5 FPS are good values for ESP32 and ESP8266 respectively).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, using MIN_SHOW_DELAY here means that user can get up to 62 fps, or 125 fps, but nothing else. Why should we limit users this way? Do our users need a nanny?
Technicially the new code is better, because it achieves any FPS chosen by the user, its efficient on CPU time, and it's safe enough on the slower chips to keep wifi running well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we limit users this way? Do our users need a nanny?
IMO yes, you should limit because it makes code more/easily predictable. And yes, most users "need a nanny". Just look at the comments on Discord or forum. But that's just my opinion. Do as you please.
BTW I did not comment code efficiency. I was merely wondering why changing MIN_SHOW_DELAY and not using that in comparisons throughout the code (instead you use "magic" numbers as @netmindz once said to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead you use "magic" numbers as @netmindz once said to me).
In this case, I think the origin and purpose of "2" and "3" is well explained in the PR discussion - there might not be a solid proof of the need, but this is not always necessary imho. Also there are hints in the code comments about the "intend" where these numbers are used. If you have a good idea for names of constants that contain "2" and "3", I'm all ears.
The previous "8" and "16" are legacy values, and afaik nobody here could explain any more where they came from - we guessed that 8 and 16 might help slower CPUs like 8266. WS2812FX uses "2" (both for esp32 and esp8266) and "10" (for AVR arduino).
Let's wait until 0.15.0 is release before merging this PR, so we can get feedback from beta testers. I'll be here to take care of problems that might be related to the change. We can always fine-tune parameters. I think that measurements from @dosipod (see above ) clearly show that this PR has a value for users.
About the topic of users needing guidance on consequences of higher framerates: We have added a warning (see screenshot) in the MoonModules fork, that appears when setting FPS above 70. @blazoncek @willmmiles @DedeHai @netmindz Should we add a similar warning in AC, too? |
the topic is well-explained now in the discussion
@blazoncek to be honest, I'm tired of justifying my work against your vague concerns. Did you ever bother to compile and test this PR? To all maintainers: I want to know if the changes I proposed in the PR are something you want in AC WLED (0.15.1) or not - as is or with improvements that we can still discuss. If nobody except myself supports this PR, I'll simply close it, and go on optimizing WLED in the MoonModules fork. |
just by looking at the FPS improvements, there clearly is something wrong with the current code, so I appreciate the efforts you are putting into this! I do however not understand the code or the issue enough to comment on the implementation of the improvements. |
You requested my review, so I gave it. Feel free to do whatever you choose. I am no longer participating in WLED development. |
regarding the "unstable warning": while a warning is ok (keep it as short as possible to not waste flash memory) having a high FPS setting potentially corrupt files is not acceptable IMHO. Unstable as in 'it may occasionally crash' is ok, as long as it does not happen on simple setups. Users tend to ignore all warings and then go complain on discord / github. edit: |
Hi, I haven't seen any real crashes due to high FPS in over a year (with a similar patch in the MM fork), but I'm mainly using esp32 and -S3, so don't know for sure if 8266 or -C3 might get into trouble when cranking up the FPS. |
I run my particle system Tests on a C3 with FPS set to 99 all the time. I get eventual crashes when adding lots of segments or switching FX very fast but if I just use it 'normally' then it works. I assume the crashes are mostly due to heap exhaustion. It is a simple setup, no alexa or HA but sometimes with AR enabled. Edit: should I run the changes from this PR on a C3 for testing? What would be worst case scenarios for crashes? |
@DedeHai How about this one - more compact, and less scary. "backup" is a link to sec settings page. |
yes please - behaviour on -C3 should be almost the same as before. Worst case might be that saving presets is not working, or UI reacts very slowly. For a test - just to check if C3 can do it - change |
* removed "use 0 for unlimited" * added "high FPS mode is experimental" warning * added "backup first!" warning * added anchor #backup to sec page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shaping up well! I think once we get the intent of MIN_SHOW_DELAY
sorted, we'll be ready to go.
* MIN_SHOW_DELAY -> MIN_FRAME_DELAY * allow up to 250 for target FPS * minor cleanup * added specific MIN_FRAME_DELAY for -S2
@willmmiles The new logic would also cause very frequent updates of analog PWM outputs, if users set target FPS above 60. I don't believe it will be a problem for esp32, however I'm not sure for 8266 (its more your area 😉). I'm thinking about the code in BusPwm::show(): Line 596 in 271a07a
|
Hmm, yes, I have an active PR in that area: #4165 For non-phase-aligned channels, it's basically just writing the new duty cycle to some shared memory. For phase aligned channels there's a short wait (at most 1 cycle at the analog frequency) per PWM channel updated. So I expect it'll be fine. I'll validate with a scope over the weekend. Strictly speaking the old code could theoretically update at that rate too, if tickled the right way. |
Glad I ran the test! The overall logic has no issues with high frame rates, but I found a case where the phase locking can break down (resulting in flickering) if the first channel is at fully on or fully off. |
Finally got around to test this PR. Got my upvote! |
@DedeHai thanks for the testing and feedback 😃 |
Are you happy for me to merge @DedeHai , now I've tagged 0.15.0.rc.1 ? |
I tested it briefly on C3 and it worked, I do not understand the code enough to say if its safe to merge without beta testing. |
@netmindz I think this is safe to merge - we have similar code in MM since a long time. The only "but" I can find is the android effect - it might run a bit slower than before, when the speed slider is set to high values. That's actually due crazy coding of this one effect. |
Reworked framerate control loop, to allow more fluent animations and support higher framerates
the problem (in a nutshell)
solution
As 8266 and esp32-c3 might suffer from higher CPU load, these chips use the previous MIN_SHOW_DELAY logic that causes more idle loops.
Comparison, Using a 64x64 "network DDP" output (esp32_wrover)