Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a failsafe for cascading FixedUpdate time accumulation #8544

Closed

Conversation

alice-i-cecile
Copy link
Member

@alice-i-cecile alice-i-cecile commented May 4, 2023

Objective

Solution

  • Adds a max_ticks_per_frame field to FixedTime. If this is set to Some, the fixed update schedule will give up after that many runs of the schedule in a single pass of the main schedule.
  • This defaults to 3.

Changelog

The FixedUpdate schedule will no longer fall further and further behind when too much work is requested.
Instead, time will begin to accumulate and a warning will be emitted.

You can configure this behavior by setting the max_ticks_per_frame field of the FixedTime resource.

@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Time Involves time keeping and reporting labels May 4, 2023
@alice-i-cecile alice-i-cecile requested a review from maniwani May 4, 2023 17:27
@alice-i-cecile alice-i-cecile added this to the 0.11 milestone May 4, 2023
@alice-i-cecile
Copy link
Member Author

I've tested out an inelegant end user version of this fix in Leafwing-Studios/Emergence#829 and it completely resolves the issue for me in my game.

Because it's not networked, I don't care about FixedUpdate falling behind.

@JoJoJet
Copy link
Member

JoJoJet commented May 4, 2023

This definitely seems like a necessary fix, but I feel that 3 ticks per frame is very low as a default. I wonder if it would be beneficial/feasible to dynamically compute the maximum number of steps based on the length of the previous frame: that way if we're in the middle of a lag spike we won't compound it by running FixedUpdate a hundred times, but once we've recovered from a lag spike the FixedUpdate schedule would be able to run enough times to let it catch up.

@alice-i-cecile
Copy link
Member Author

Yeah, I wanted to keep this simple for now with conservative defaults. For my game I want to have a fixed update period that's substantially less than the frame rate (as it's a relaxed factory builder), but I'm definitely open to other defaults or schemes.

Copy link
Member

@JoJoJet JoJoJet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can leave the dynamic maximum idea for the future. Although, I think the more conservative approach would be to have a high maximum number of ticks per frame -- a max of 3 could be a footgun. Let's say a user sets their fixed update to run 1000 times per second (potentially due to a fast-forwarded simulation), but their game is only running at 60 FPS due to VSync. Even if their computer is powerful enough to run FixedUpdate that often, bevy is going to throttle it and say it's doing too much work.

I'm not exactly sure what a good default would be, but I feel that something in the range of 20-ish would be less likely to get in the way. The default maximum only needs to be low enough to prevent a spiral of doom IMO, users can tune the settings themselves if they want to iron out all of their frame hiccups.

Otherwise, LGTM.

crates/bevy_time/src/fixed_timestep.rs Outdated Show resolved Hide resolved
@alice-i-cecile
Copy link
Member Author

Bumped to 20; it's easy to tweak in individual games.

@maniwani
Copy link
Contributor

maniwani commented May 5, 2023

Limiting the number of ticks per frame is a good change.

And yeah, it's only effective if the spike is temporary and you're just weathering it by stretching it out over a few frames. The player's machine still needs to be capable of simulating at the requested rate most of the time.

It'd be awesome if we can follow this up with either the "have two Time resources, one for FixedUpdate, one for everything else" idea or the "make FixedTime wrap Time" idea since we currently lack a built-in way to measure "time elapsed" in FixedUpdate.

@maniwani
Copy link
Contributor

maniwani commented May 5, 2023

I'd make the limit a usize though. I know 255 already sounds pretty impossible, but future hardware could go crazy and we don't really win anything by saving a few bytes here.

@alice-i-cecile
Copy link
Member Author

It'd be awesome if we can follow this up with either the "have two Time resources, one for FixedUpdate, one for everything else" idea or the "make FixedTime wrap Time" idea since we currently lack a built-in way to measure "time elapsed" in FixedUpdate.

Agreed! I can tackle that on Monday if you'd like.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label May 5, 2023
@mockersf
Copy link
Member

mockersf commented May 5, 2023

This feel like a surprise bug for people who expect a fixed time update when using the fixed time update.

Instead of limiting the number of ticks to an arbitrary 20, which would be a perfectly reasonable number of iterations to hit depending on your use case, shouldn't this look at the time it actually took and compare it to the period?

The warn message could also be a lot more explicit on what to do. I think an error code/page with more details would be better, with example code on what to do. It should also mention increasing the maximum number of ticks as a solution which can be valid. As it is now, there's also a chance that it will be displayed very often, which we want to avoid.

@mockersf mockersf added the X-Controversial There is active debate or serious implications around merging this PR label May 5, 2023
@JMS55
Copy link
Contributor

JMS55 commented May 5, 2023

Unity seems to cap the amount of time spent during fixed timesteps, and not the amount of timesteps https://docs.unity3d.com/Manual/class-TimeManager.html

@maniwani
Copy link
Contributor

maniwani commented May 5, 2023

Unity seems to cap the amount of time spent during fixed timesteps, and not the amount of timesteps https://docs.unity3d.com/Manual/class-TimeManager.html

Unity's "Maximum Allowed Timestep" only caps the amount of time that can accumulate towards new simulation steps each frame.

It is not related how long FixedUpdate takes to run. If the setting is 50ms and a frame takes 100ms, only 50ms will be accumulated. That's just an indirect way of limiting the number of steps per frame. I think a setting to directly limit that number is better. (edit: There is a subtle difference. Unity's setting works to slow down the simulation, whereas this PR simply load balances steps over multiple frames.)

Shouldn't this look at the time it actually took and compare it to the period?

Ideally, we'd collect statistics about how long one FixedUpdate takes to run and how long the rest of the frame takes, then calculate:

budget = (1.0 / target FPS) - (rest of frame)
max steps = budget / one fixed update

But we can do that later. A manual setting will work well enough to prevent a temporary hiccup from turning into a death spiral. 20 steps per frame is very conservative. By default, steps only accumulate at a rate of 60 per second. You won't hit the limit without having some real performance hiccups.

@alice-i-cecile
Copy link
Member Author

It is not related how long FixedUpdate takes to run. If the setting is 50ms and a frame takes 100ms, only 50ms will be accumulated. That's just an indirect way of limiting the number of steps per frame. I think a setting to directly limit that number is better. (edit: There is a subtle difference. Unity's setting works to slow down the simulation, whereas this PR simply load balances steps over multiple frames.)

Agreed: I started out with a "max time per frame", but realized that was a pointless level of indirection.

I think that slowing down the simulation is substantially more dangerous than load-balancing in most applications.

Instead of limiting the number of ticks to an arbitrary 20, which would be a perfectly reasonable number of iterations to hit depending on your use case, shouldn't this look at the time it actually took and compare it to the period?

I'm not opposed to that, but I wanted to keep this PR relatively simple and then iterate.

This feel like a surprise bug for people who expect a fixed time update when using the fixed time update.

The existing behavior is very bad: it takes a "slow but playable" game state and turns it into a complete hang, with no indication as to what's going on. The critical fixed time behavior is preserved (the time steps are always exactly the same length), under this behavior the simulation just slows down (equivalent to Unity's behavior, but more robust since we don't start leaking time).

The warn message could also be a lot more explicit on what to do. I think an error code/page with more details would be better, with example code on what to do. It should also mention increasing the maximum number of ticks as a solution which can be valid. As it is now, there's also a chance that it will be displayed very often, which we want to avoid.

Good call: I'll try to improve this and reduce its spam.

@alice-i-cecile alice-i-cecile removed the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label May 6, 2023
Copy link
Contributor

@nakedible nakedible left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the policy is on warnings, so this pull may be okay. It's fine otherwise, so I'm tending on the side that this improves things from the current behaviour.

// and needs to run more and more times per frame to catch up.
if let Some(max_ticks) = maybe_max_ticks {
if ticks_this_frame > max_ticks {
warn!("The fixed update schedule is falling behind. Consider increasing the period or decreasing the amount of work done in the fixed update schedule.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the game enters the "death spiral", this warning will print out three times per second, constantly. I'm not sure what the policy on warning spam is, so this may be okay.

Also, if the game freezes for a while for any reason (OS freeze for 5 seconds, memory swapping, big blocking loading operation inside game, whatever), these warnings will be printed out for a while (freeze duration seconds * 3 times) until the situation resolves itself. In this case, saying that update schedule is falling behind is incorrect, as it is rather catching up to realtime after a freeze.

And if there is a longer freeze, like OS suspend, that lasts for hours or days, then after resuming we'll be stuck doing maximum amount of fixed updates for a very long time as the game catches up to realtime with 20x acceleration but no faster – and the warnings are printed out for the entire duration it is catching up. However, fixing this is out of scope for this pull, so we don't need to care about the behaviour in this case.

@alice-i-cecile alice-i-cecile removed this from the 0.11 milestone Jun 21, 2023
@alice-i-cecile
Copy link
Member Author

Closing in favor of a more thoughtful solution in #8915.

Actually (nth edition)... My proposal is that we add maximum_delta to Time, and do not add anything max ticks to FixedTime. I'm going to try to explain why. There are three problem cases to resolve.

First case is that game freezes for a few seconds due to some external cause, or possibly a few hours or few days due to suspend. This is fully handled by the maximum_delta option, as we just get one 0.333 second delta update and that's it. Basically means that by default game time doesn't go forward if it isn't regularily being iterated, and the 0.333 limit means that any FPS above 3 is automatically handled without any problems in elapsed time calculation. Fixed time will be iterated at maximum 20 times in a row if there is a freeze (with the default settings).

Second case is that we temporarily run into a situation in the game where processing and simulating a frame takes longer than the real time it is simulating. In this case, the maximum amount of delta that will be added per single update is 0.333 seconds, which corresponds to 20 frames in the default fixed step. The game will slow down to at most 3 FPS, and may slow down even further, depending on the ratio of how much more time does it need to simulate the game compared to realtime. For example, if it takes 1 second to simulate 0.333 seconds, then we'd be looking at pretty much 1 FPS, and elapsed would progress a third of the speed of real time – but the game wouldn't freeze. As soon as simulation performance improves, it would return to normal operation, as there's no accumulator buildup as we always run the accumulator empty.

Third case is that game constantly takes more time to simulate than real time. In this case, FPS is always low, and elapsed ticks slower than real time, and we do 20 fixed steps per frame. But the game doesn't freeze or do the death spiral. Fixed step accumulator will always be run empty. If the maximum_delta is reduced, then FPS will improve, but obviously the real fix is to make sure the game runs faster than realtime.

So, all of these cases are sufficiently handled merely by adding maximum_delta, and we don't need another mechanism for fixing this at all. And FixedTime will never desync from elapsed(). (There is another problem with FixedTime with regard of getting the current elapsed time, but that's unrelated issue to this one.)

Do you agree? So do not merge pull #8544, and merge pull #8915 (once it has been completed, tested and accepted).

@alice-i-cecile
Copy link
Member Author

Closing in favor of working out a more comprehensive solution in #8915.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Time Involves time keeping and reporting C-Bug An unexpected or incorrect behavior X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FixedUpdate schedules that take too long enter a cascading failure mode
6 participants