-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize extract_clusters and prepare_clusters systems #10633
Conversation
fa3c8af
to
5526e88
Compare
Avoid cloning VisiblePointLights struct in extract_clusters because depending on scene it may result in cloning of thousands non-empty vectors which ends in a lot of heap allocations and memory copying every frame. Instead build allocate a single vector and fill it with data from all source vectors which is much faster.
5526e88
to
8f8e362
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run this on some of our more intensive stress test examples? In this particular case, many_lights
might be a better test than 3d_scene
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have hit this is as well multiple times, so it would be great to get it fixed.
I read this pretty thoroughly and it looks correct and like a good approach to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good find!
CI failure appears to be a network error; retrying the merge. |
) # Objective When developing my game I realized `extract_clusters` and `prepare_clusters` systems are taking a lot of time despite me creating very little lights. Reducing number of clusters from the default 4096 to 2048 or less greatly improved performance and stabilized FPS (~300 -> 1000+). I debugged it and found out that the main reason for this is cloning `VisiblePointLights` in `extract_clusters` system. It contains light entities grouped by clusters that they affect. The problem is that we clone 4096 (assuming the default clusters configuration) vectors every frame. If many of them happen to be non-empty it starts to be a bottleneck because there is a lot of heap allocation. It wouldn't be a problem if we reused those vectors in following frames but we don't. ## Solution Avoid cloning multiple vectors and instead build a single vector containing data for all clusters. I've recorded a trace in `3d_scene` example with disabled v-sync before and after the change. Mean FPS went from 424 to 990. Mean time for `extract_clusters` system was reduced from 210 us to 24 us and `prepare_clusters` from 189 us to 87 us. ![image](https://github.com/bevyengine/bevy/assets/160391/ab66aa9d-1fa7-4993-9827-8be76b530972) --- ## Changelog - Improved performance of `extract_clusters` and `prepare_clusters` systems for scenes where lights affect a big part of it.
Objective
When developing my game I realized
extract_clusters
andprepare_clusters
systems are taking a lot of time despite me creating very little lights. Reducing number of clusters from the default 4096 to 2048 or less greatly improved performance and stabilized FPS (~300 -> 1000+). I debugged it and found out that the main reason for this is cloningVisiblePointLights
inextract_clusters
system. It contains light entities grouped by clusters that they affect. The problem is that we clone 4096 (assuming the default clusters configuration) vectors every frame. If many of them happen to be non-empty it starts to be a bottleneck because there is a lot of heap allocation. It wouldn't be a problem if we reused those vectors in following frames but we don't.Solution
Avoid cloning multiple vectors and instead build a single vector containing data for all clusters.
I've recorded a trace in
3d_scene
example with disabled v-sync before and after the change.Mean FPS went from 424 to 990. Mean time for
extract_clusters
system was reduced from 210 us to 24 us andprepare_clusters
from 189 us to 87 us.Changelog
extract_clusters
andprepare_clusters
systems for scenes where lights affect a big part of it.