Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high CPU usage on MacOS #10261

Open
hmans opened this issue Oct 25, 2023 · 35 comments
Open

Very high CPU usage on MacOS #10261

hmans opened this issue Oct 25, 2023 · 35 comments
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system S-Needs-Investigation This issue requires detective work to figure out what's going wrong

Comments

@hmans
Copy link

hmans commented Oct 25, 2023

Summary

On MacOS, even simple Bevy apps cause very high CPU load; for example, even a blank app that just loads DefaultPlugins to render a blank window sits at ~50% CPU usage on an Apple Silicon M1. Add a couple of rendered meshes and postprocessing and it'll quickly go up to about ~90% CPU.

On Windows and Linux, the same app will comfortable sit below 1% (even with some light rendering.)

Additional notes:

  • A MRP has been provided here.
  • It uses the opt-level = 3 etc. trick from the docs.
  • Compiling a --release build doesn't make a difference.
  • In addition, when hiding the app, CPU load doubles, but I understand there have already been issues filed about this particular problem.

Bevy version

0.11.3 (but I also tried 66f72dd, the newest commit on main at the time.)

Relevant system information

  • Rust 1.73.0
  • MacOS Sonoma 14.0
AdapterInfo { name: "Apple M1", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
SystemInfo { os: "MacOS 14.0 ", kernel: "23.0.0", cpu: "Apple M1", core_count: "8", memory: "16.0 GiB" }

What you did

What went wrong

Expected behavior: the app should cause CPU load in the single-digit percentage range, like it does on Windows (<1% CPU, on a Ryzen 7.)

Actual behavior: the app caused ~50% CPU load.

@hmans hmans added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Oct 25, 2023
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Investigation This issue requires detective work to figure out what's going wrong O-MacOS Specific to the MacOS (Apple) desktop operating system and removed S-Needs-Triage This issue needs to be labelled labels Oct 25, 2023
@alice-i-cecile
Copy link
Member

This is a long-standing bug, but it's nice to see it seemingly isolated to Mac. Could we see a tracing log? https://github.com/bevyengine/bevy/blob/main/docs/profiling.md

I'd also like to try stripping out even more things from the DefaultPlugins, and see if we can reduce this to one or two offending plugins. Unfortunately I don't have access to a Mac to test this myself, sorry.

@hmans
Copy link
Author

hmans commented Oct 25, 2023

I've created a tracing log using cargo run --release --features bevy/trace_chrome, letting run the app run for a handful of seconds, but it's already 442 MB, so I can't upload it here. :(

Is there any way I can send it to you, via Perfetto or otherwise?

@hmans
Copy link
Author

hmans commented Oct 25, 2023

Assuming that this represents a (randomly picked) single frame, maybe this is of use to you:

image

@alice-i-cecile
Copy link
Member

Maybe a GDrive link? Feel free to delete it after a couple days.

@superdump it looks like we're getting hit by excessively long prepare systems again 🤔

@hmans
Copy link
Author

hmans commented Oct 25, 2023

@alice-i-cecile - here's a Google Drive link for the trace, I hope it works for you:

https://drive.google.com/file/d/1DxPcWasjVDRJDlL1_PSQh-5qhpxEeA3p/view?usp=sharing

@alice-i-cecile alice-i-cecile added the A-Rendering Drawing game state to the screen label Oct 25, 2023
@alice-i-cecile
Copy link
Member

Trace downloaded and open locally, thank you very much!

Pulling this up in Perfetto, I'm still just seeing extremely long Prepare systems. I wonder if the way that wgpu polls is different on Mac? Perhaps it's spinning a thread to try and vsync automatically?

@hmans
Copy link
Author

hmans commented Oct 25, 2023

Pulling this up in Perfetto, I'm still just seeing extremely long Prepare systems. I wonder if the way that wgpu polls is different on Mac? Perhaps it's spinning a thread to try and vsync automatically?

Not sure how helpful this is, but I did try a couple of the more simple wgpu examples (from their latest main) and didn't observe any noticeable performance issues that looked like the one I was seeing with Bevy.

But please keep in mind that I'm new to this entire ecosystem (including Rust itself), so YMMV...

@alice-i-cecile
Copy link
Member

That's very helpful feedback.

This might be related to #9964.

@UkoeHB
Copy link
Contributor

UkoeHB commented Oct 25, 2023

If you reduce frame rate significantly, e.g. with WinitSettings, then CPU usage drops. That's the only 'solution' I have found. Framepacing does nothing.

@superdump
Copy link
Contributor

Reduce frame rate to what?

@UkoeHB
Copy link
Contributor

UkoeHB commented Oct 25, 2023

Reduce frame rate to what?

Like 1-10 FPS...

@Vrixyz
Copy link
Member

Vrixyz commented Oct 29, 2023

I'm trying to reduce the amount of code necessary to reproduce out on hmans/bevy-mrp-macos-performance#1

after more looking, aren't we hitting the warning from the profiling docs (https://github.com/bevyengine/bevy/blob/main/docs/profiling.md#runtime):

When your app is bottlenecked by the GPU, you may encounter frames that have multiple prepare-set systems all taking an unusually long time to complete, and all finishing at about the same time.

?

If that's the case, I'd love a dedicated issue to that.
Would adding some span help with pinpointing the exact issue ? these prepare_* functions are quite long and it's not clear to me what's the expensive operation 🤔

@Vrixyz
Copy link
Member

Vrixyz commented Oct 30, 2023

Seems related to #5713

@matevz-ap
Copy link

matevz-ap commented Nov 11, 2023

I noticed that my MacBook was running hot, and I found this issue. Upon running the starting example with DefaultPlugins, I noticed an unusually high CPU usage of 170%. Through a process of elimination, I determined that the culprit behind the issue is the RenderPlugin. The example runs smoothly with normal CPU usage when this particular plugin is excluded. While I intend to delve further into this matter, my limited experience with Bevy, having used it for only a day, may limit the extent of my assistance.

@dmlary
Copy link
Contributor

dmlary commented Nov 21, 2023

So I dug into this about 5 months ago, and found the same problem somewhere in the render pipeline. The discussion & profiling results are in the discord thread below.

It looked like it may have been a contention issue in wgpu. That was about where I stopped because needed to make a wgpu-only poc to see if it was a wgpu-on-mac issue. Also this research was done on an Intel Mac.

https://discord.com/channels/691052431525675048/743663924229963868/1119076766405886004

from the discord:

I was using apple's instruments to do the profiling, and have used the builtin macos sampling. They all report that the thread is blocked in the [gpu] write path. Sampling can't really detect busy loop vs wait unless the function name gives it away. I haven't seen anything that gives away what's going on.

@hmans
Copy link
Author

hmans commented Jun 7, 2024

I've just checked the current latest and this still seems to be an issue (2 full cores used on a M2 Pro with even the trivial examples.) Is there any outlook on this eventually improving?

@alice-i-cecile
Copy link
Member

We are interested in this improving! But this needs further investigation to pinpoint: it's still not at all clear where in the stack this is occurring. Does this occur with wgpu's examples?

@adals
Copy link

adals commented Jun 7, 2024

@hmans
Copy link
Author

hmans commented Jun 7, 2024

We are interested in this improving! But this needs further investigation to pinpoint: it's still not at all clear where in the stack this is occurring. Does this occur with wgpu's examples?

I just tried exactly this today (current HEAD there) and it seems fine. I'm happy to provide further testing or any other input that might help. Sadly I am not familiar with either codebase so I can't really provide any input there.

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Jun 7, 2024

Great :) I would double-check at https://github.com/gfx-rs/wgpu/releases/tag/v0.19.4 which is the version Bevy is currently on, but I would suspect that there seems to be something wrong with Bevy's implementation then.

It looks like the various Prepare steps are at fault. I'll ask the rendering experts if they might have insight there. I suspect we're spin locking while waiting to coordinate between the various steps somewhere, which is manifesting as high CPU usage without actually doing useful work.

Are you able to achieve reasonable results (more than 50k entities) on the Bevymark example?

@IceSentry
Copy link
Contributor

@hmans could you try forcing a scale_factor of 1.0 for the window and see if it makes a difference. For some reason macos reports a really high scale factor by default.

@superdump
Copy link
Contributor

superdump commented Jun 8, 2024

I tested with 0.11.3 and 0.14.0-rc.2 and observe the same. Using scale_factor of 1.0 makes no difference. It hovers around 40-50% CPU in Activity Monitor, top, htop.

I looked at a GPU frame trace in Xcode and it's only running the no_camera_clear_pass and using microseconds of time. Oddly it said that pass was using 10800 vertices. No idea why.

I looked at a CPU profile using Instruments and that shows no significant CPU usage. Running for 20s put CPU cycles spent for frame updates (at 120Hz) at about the same as plugin initialisation.

2024-06-08 - bevy macos default plugins cpu.trace.zip

@superdump
Copy link
Contributor

bevy-mrp-macos-performance.gputrace.zip
Here's the GPU trace with scale factor 1.0.

@superdump
Copy link
Contributor

And just to note that I think tracy traces showing waiting on prepare set systems makes sense as they are blocked by waiting on vsync for a new swapchain texture.

@hmans
Copy link
Author

hmans commented Jul 5, 2024

I've tried 0.14 and the issue is still persisting.

Since wgpu itself does not seem to have this issue, from the perspective of a naive outsider who's not familiar with either code bases, my best guess would be that wgpu uses vsync out of the box (idling until it's time to render the next frame) and Bevy doesn't (at least on macOS); however, even the "fps_overlay" example that renders literally nothing uses ~100% CPU while displaying an FPS count of 100 (on my 100Hz screen) when focused, and ~60 FPS when unfocused. So it's probably not vsync vs. unlocked.

A couple of folks on Reddit pointed me towards the (new?) WinitSettings::desktop_app(), but that isn't the solution to this issue because all it does is prevent the rendering of new frames if there is no user input. Once it does render frames, things are as described above.

@hmans
Copy link
Author

hmans commented Jul 6, 2024

Great :) I would double-check at https://github.com/gfx-rs/wgpu/releases/tag/v0.19.4 which is the version Bevy is currently on, but I would suspect that there seems to be something wrong with Bevy's implementation then.

I know I've taken my good time with doing this (apologies), but I now got around to checking that specific version, and its cube example uses a relaxed ~19% CPU time for rendering at 120 FPS (I'm on a M2 Pro now.)

Bevy 0.14 appears to use wgpu 0.20, which exhibits the same performance.

19% still feel a little too high for what the example is doing, but it's nowhere near the 50%-200% that I'm seeing with simple Bevy examples.

As mentioned on Reddit earlier today, I'm (stupidly and naively, and very likely unsuccessfully) trying to track this issue down myself now. Just wanted to post these numbers here for reference.

@katopz
Copy link

katopz commented Sep 9, 2024

I can reproduce this by run Sprite Sheet example on m3max, it took 66-75% CPU and 34% CPU for opt-level = 1
image

Not looking good for just 1 sprite animation.

@IceSentry
Copy link
Contributor

Can you try with opt-level=3 or just --release? opt-level=1 is barely any optimization and isn't a good indicator of real performance.

@katopz
Copy link

katopz commented Sep 10, 2024

Can you try with opt-level=3 or just --release? opt-level=1 is barely any optimization and isn't a good indicator of real performance.

opt-level=1 👉 34% CPU
opt-level=3 👉 36% CPU

i didn't bother --release because i on dev and my m3 fans are spinning 🔥

[profile.dev]
opt-level = 3

[profile.release]
lto = true
opt-level = 3
codegen-units = 1
incremental = false
debug = false

And the result with [profile.release] is
--release 👉 34% CPU

just --release 👉 36% CPU

Actually it look same all over setting. Something really weird here.

@destravous
Copy link

destravous commented Oct 14, 2024

What are the chances this issue is related to how bevy is calling winit?
On the second to last known regression mentioned here: #11052

Bevy's entire update loop is currently running in the about_to_wait() wininit callback; there seems to be a whole bunch of workarounds / bevy ecs events passed to make this functional.
The wininit docs mention that about_to_wait() is not supposed to be used for main update loops like this, and WindowEvent::RedrawRequested should be preferred instead.

See ApplicationHandler in bevy_winit::state.rs

@Kees-van-Beilen
Copy link
Contributor

What are the chances this issue is related to how bevy is calling winit?
On the second to last known regression mentioned here: #11052

Bevy's entire update loop is currently running in the about_to_wait() wininit callback; there seems to be a whole bunch of workarounds / bevy ecs events passed to make this functional.
The wininit docs mention that about_to_wait() is not supposed to be used for main update loops like this, and WindowEvent::RedrawRequested should be preferred instead.

See ApplicationHandler in bevy_winit::state.rs

Hey I was just reading the issue, that's is highly likely to be the issue. It should however not appear when the app is compiled in release mode. If anyone can provide a recent MPC example of this problem I would gladly help pinpoint the exact issue.

I've worked closely with the macOS windowing api's so if the problem is located in there, it should be easy to spot

@Kees-van-Beilen
Copy link
Contributor

I've done some basic profiling and it does look like the cpu usage is completely bevy sided, in release mode the time spend on the main update loop divided by 16,7 roughly equals the cpu usage. The high cpu usage that i've found is more likely due to blocking operations, either due to multi-core multi-threading overhead, or async overhead.

@Zajozor
Copy link

Zajozor commented Dec 7, 2024

If anyone can provide a recent MPC example of this problem I would gladly help pinpoint the exact issue.

If this still helps - I'm here with a M1 chip (Macbook Pro 2020 M1 16GB RAM).

  1. git clone [email protected]:bevyengine/bevy.git
  2. cargo run --features="bevy_dev_tools" --example fps_overlay runs at 60FPS & ~80% CPU (interestingly, when the window is not focused, it jumps to 85-90% CPU fairly consistently)
  3. cargo run --release --features="bevy_dev_tools" --example fps_overlay runs at 60FPS & ~45% CPU

One interesting thing I noticed is that the CPU jumps considerably when you start moving the mouse over the window while it's not focused. Here's a short example:

CleanShot.2024-12-07.at.20.39.31.mp4

(not sure if it's relevant, just thought it might be)

Note: A very comparable result happens with several examples from the crate that I tried.

Note 2:

[profile.release]
lto = true
opt-level = 3
codegen-units = 1
incremental = false
debug = false

Does not make a difference compared to just --release without additional tuning.

lmk if I can help reproducing anything, or trying out a fix 🙏🏻

@wentao
Copy link

wentao commented Dec 10, 2024

I ran into similar issue on iOS (not Mac though). A simple example like https://bevyengine.org/examples/3d-rendering/3d-scene/ running on iOS (with bevy 0.14.2) has 100%+ CPU usage. I have not tried the --release flag yet.

@voneiden
Copy link

voneiden commented Jan 1, 2025

I was looking at a similar issue on Linux/Wayland and figured that UpdateMode makes a really big difference in my case. With Continuous (default) I'm getting something like 60-70% CPU usage and with Reactive I'm getting about 2%.

Just throwing it out here, maybe it's of some use to you folks. Considering this issue doesn't seem to affect everyone I'm guessing that Continuous is not a CPU hog for most users. Perhaps some OS/WM level thing that limits redraw frequency? rust-windowing/winit#3933 while not related to bevy, gives me the impression that in some cases request_redraw (which bevy uses) might end up drawing the window at a really high frequency - way higher than what bevy is rendering at. In the linked issue the reporter observed 76000 redraws per second.

Edit: nevermind... I was being a bit too hasty here, reactive hit's continuous level CPU if there's mouse motion. Makes sense, in hindsight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Projects
None yet
Development

No branches or pull requests