-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QueryParIter::map_collect and similar operations #11249
base: main
Are you sure you want to change the base?
Conversation
68398e6
to
c902a13
Compare
c902a13
to
53ca955
Compare
This can be quite performance sensitive code. Some form of profiling on our existing stress tests would be nice. I'm also hesitant to promote a option that allocates on every call as well, particular if it's a Vec being recreated and regrown every tick of the app. |
@james7132 suggestions how to benchmarks? Overall, I don't believe, it will be bad. Malloc is cheap. Moreover, it is quite possible, it will actually increase performance, because it gives users an instrument to parallelize code, which was too hard to parallelize before. |
I was helping some who needed this yesterday. This allows us to, for example, buffer events parallel and then dispatch them in sequence while preserving order. Seems super useful to me. Many useful features can degrade performance if misapplied. If the docs are clear about the performance characteristics, I think allocation shouldn't be a blocking issue. |
I did lame benchmark. I ran
and patched
By default on my laptop it outputs about 18.5 FPS. If I replace If I rewrite the function with Also tried running the same test with To get precise perf difference, much better benchmark is needed. So my conclusion is that if |
# Objective Issue #10243: rendering multiple triangles in the same place results in flickering. ## Solution Considered these alternatives: - `depth_bias` may not work, because of high number of entities, so creating a material per entity is practically not possible - rendering at slightly different positions does not work, because when camera is far, float rounding causes the same issues (edit: assuming we have to use the same `depth_bias`) - considered implementing deterministic operation like `query.par_iter().flat_map(...).collect()` to be used in `check_visibility` system (which would solve the issue since query is deterministic), and could not figure out how to make it as cheap as current approach with thread-local collectors (#11249) So adding an option to sort entities after `check_visibility` system run. Should not be too bad, because after visibility check, only a handful entities remain. This is probably not the only source of non-determinism in Bevy, but this is one I could find so far. At least it fixes the repro example. ## Changelog - `DeterministicRenderingConfig` option to enable deterministic rendering ## Test <img width="1392" alt="image" src="https://github.com/bevyengine/bevy/assets/28969/c735bce1-3a71-44cd-8677-c19f6c0ee6bd"> --------- Co-authored-by: Alice Cecile <[email protected]>
There is one bench that tests parallel iteration. You can run it with |
Ok, I tried adding another fake benchmark there.
The output is
In this artificial setup, parallel Which is exactly what is expected: code is doing what it is supposed to do. More correct benchmark would be comparison with |
can you compare For checking with |
I can compare against
bevy/crates/bevy_tasks/src/task_pool.rs Line 287 in 06bf928
|
OK, benchmark against main. On main. cargo bench -p benches --bench ecs -- heavy_compute --save-baseline main on this branch: cargo bench -p benches --bench ecs -- heavy_compute --baseline main
|
Objective
par_iter
is convenient, unless you need to collect the results into aVec
orHashMap
.Bevy commonly uses thread-local collectors, and aggregates them, for example, here:
bevy/crates/bevy_render/src/view/visibility/mod.rs
Lines 452 to 454 in 101037d
it is not convenient because:
Solution
Add
QueryParIter
functions:flat_map_collect
(similar to.flat_map(...).collect()
filter_map_collect
map_collect
These functions might be not as universally applicable as
for_each
with thread-local aggregators, because more expensive, but:Changelog
QueryParIter
has new functions:flat_map_collect
,filter_map_collect
,map_collect
.