Implement `Drop` for `CommandQueue` #10746

bushrat011899 · 2023-11-26T08:20:12Z

Objective

Fixes Dropping a CommandQueue leaks all contained Commands #10676, preventing a possible memory leak for commands which owned resources.

Solution

Implemented Drop for CommandQueue. This has been done entirely in the private API of CommandQueue, ensuring no breaking changes. Also added a unit test, test_command_queue_inner_drop_early, based on the reproduction steps as outlined in #10676.

Notes

I believe this can be applied to 0.12.1 as well, but I am uncertain of the process to make that kind of change. Please let me know if there's anything I can do to help with the back-porting of this change.

bushrat011899

I'm leaving a review of my own changes just to provide some additional context to what might be some confusing choices I've made.

bushrat011899 · 2023-11-26T08:21:29Z

crates/bevy_ecs/src/system/commands/command_queue.rs

@@ -9,8 +9,11 @@ struct CommandMeta {
    /// SAFETY: The `value` must point to a value of type `T: Command`,
    /// where `T` is some specific type that was used to produce this metadata.
    ///
+    /// `world` is optional to allow this one function pointer to perform double-duty as a drop.


I've done this to ensure CommandMeta is still exactly 8 bytes in size. Since the inner Drop implementation for C is specific to C, this is the only way to capture it without making CommandMeta at least 16 bytes large.

bushrat011899 · 2023-11-26T08:22:14Z

crates/bevy_ecs/src/system/commands/command_queue.rs

    /// Returns the size of `T` in bytes.
-    apply_command_and_get_size: unsafe fn(value: OwningPtr<Unaligned>, world: &mut World) -> usize,
+    consume_command_and_get_size:
+        unsafe fn(value: OwningPtr<Unaligned>, world: &mut Option<&mut World>) -> usize,


Using &mut Option<&mut World> to allow for the return of the Option<&mut World> back to the CommandQueue after consume_command_and_get_size is done with it.

You can avoid the extra &mut by reborrowing the &mut World before calling consume_command_and_get_size, for example by passing world.as_deref_mut()

I've reduced the usage of the &mut Option<&mut World>, but I can't fully eliminate it. CommandQueue itself is only passed a mutable reference to World, and I don't see a way to convert &mut World into &mut Option<World>. In fact, I think such a transform is nonsensical.

bushrat011899 · 2023-11-26T08:22:38Z

crates/bevy_ecs/src/system/commands/command_queue.rs

    /// Returns the size of `T` in bytes.
-    apply_command_and_get_size: unsafe fn(value: OwningPtr<Unaligned>, world: &mut World) -> usize,
+    consume_command_and_get_size:


I've renamed this field for clarity.

bushrat011899 · 2023-11-26T08:23:20Z

crates/bevy_ecs/src/system/commands/command_queue.rs

+                    // Apply command to the provided world...
+                    Some(world) => command.apply(world),
+                    // ...or discard it.
+                    None => drop(command),


An explicit drop call is not required, since it will be implicitly called once the scope of this closure ends. I've added it here purely for clarity of purpose.

james7132 · 2023-11-26T08:57:59Z

The extra branch here on each command is in the hotpath for command application. Please run the command microbenchmarks and/or the stress tests to see if this has a tangible impact on that perf.

bushrat011899 · 2023-11-26T13:11:11Z

Ok so I tried comparing this change to the main branch using the ECS benchmarks, and I'm not really confident in drawing a conclusion any which way. I had pretty extreme run-to-run variance running the bench repeatedly, which is probably just a problem with my machine (it is my general-purpose Window box, so who knows what's happening back there)

Bench Output

> cargo bench --bench ecs -- --load-baseline commandqueue_drop --baseline main .*_command
    Finished bench [optimized] target(s) in 0.20s
     Running benches/bevy_ecs/benches.rs (target\release\deps\ecs-7d4169dd95b56a01.exe)
Gnuplot not found, using plotters backend
empty_commands/0_entities
                        time:   [3.2537 ns 3.2847 ns 3.3315 ns]
                        change: [-0.0206% +0.9463% +1.8464%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe  
spawn_commands/2000_entities
                        time:   [149.76 µs 151.23 µs 152.78 µs]
                        change: [-1.7627% -0.1753% +1.3376%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
spawn_commands/4000_entities
                        time:   [303.65 µs 306.56 µs 309.66 µs]
                        change: [-2.2620% -0.6257% +1.1928%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
spawn_commands/6000_entities
                        time:   [456.51 µs 461.40 µs 466.27 µs]
                        change: [-0.9910% +0.9277% +2.8949%] (p = 0.37 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
spawn_commands/8000_entities
                        time:   [629.32 µs 636.60 µs 644.87 µs]
                        change: [-0.7069% +0.9323% +2.6418%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe  
insert_commands/insert  time:   [307.34 µs 309.23 µs 312.06 µs]
                        change: [+6.1567% +8.6578% +11.358%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
insert_commands/insert_batch
                        time:   [215.72 µs 217.29 µs 219.28 µs]
                        change: [+0.5404% +2.1128% +3.6292%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe  
fake_commands/2000_commands
                        time:   [6.6523 µs 6.7584 µs 6.8651 µs]
                        change: [+23.412% +24.754% +26.305%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
  13 (13.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe
fake_commands/4000_commands
                        time:   [12.901 µs 13.079 µs 13.262 µs]
                        change: [+12.367% +14.289% +16.331%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
fake_commands/6000_commands
                        time:   [19.175 µs 19.344 µs 19.536 µs]
                        change: [+17.373% +18.577% +19.704%] (p = 0.00 < 0.05)
                        Performance has regressed.
fake_commands/8000_commands
                        time:   [25.453 µs 25.731 µs 26.018 µs]
                        change: [+14.211% +15.489% +16.744%] (p = 0.00 < 0.05)
                        Performance has regressed.  
sized_commands_0_bytes/2000_commands
                        time:   [3.1955 µs 3.2128 µs 3.2332 µs]
                        change: [+4.3901% +5.7755% +7.1056%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
sized_commands_0_bytes/4000_commands
                        time:   [6.3384 µs 6.3899 µs 6.4597 µs]
                        change: [+3.5901% +4.8468% +5.8916%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
sized_commands_0_bytes/6000_commands
                        time:   [9.4952 µs 9.5870 µs 9.7428 µs]
                        change: [+0.8685% +2.0121% +3.1798%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
sized_commands_0_bytes/8000_commands
                        time:   [12.823 µs 12.917 µs 13.092 µs]
                        change: [+1.5821% +3.5291% +5.2057%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe  
sized_commands_12_bytes/2000_commands
                        time:   [4.4384 µs 4.4833 µs 4.5537 µs]
                        change: [+7.1673% +9.5000% +11.495%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
sized_commands_12_bytes/4000_commands
                        time:   [8.8486 µs 8.9161 µs 9.0135 µs]
                        change: [+10.094% +11.156% +12.235%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
sized_commands_12_bytes/6000_commands
                        time:   [13.104 µs 13.202 µs 13.340 µs]
                        change: [+7.4410% +8.7096% +10.010%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe
sized_commands_12_bytes/8000_commands
                        time:   [17.701 µs 17.827 µs 17.999 µs]
                        change: [+9.4137% +10.645% +11.941%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe  
sized_commands_512_bytes/2000_commands
                        time:   [45.140 µs 45.600 µs 46.118 µs]
                        change: [+1.1514% +2.6101% +4.5087%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
sized_commands_512_bytes/4000_commands
                        time:   [89.803 µs 90.349 µs 91.052 µs]
                        change: [-4.7781% -1.6346% +1.3122%] (p = 0.31 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
sized_commands_512_bytes/6000_commands
                        time:   [135.34 µs 135.93 µs 136.68 µs]
                        change: [-2.2616% +1.9043% +6.5246%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
sized_commands_512_bytes/8000_commands
                        time:   [179.64 µs 180.09 µs 180.65 µs]
                        change: [-3.6188% +0.1834% +4.1426%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

To me, it looks like fake_commands and sized_commands are the two benches that might have regressed, but I saw some benchmarks swing by plus or minus 50% after repeated runs, so I don't know if those are genuine, or just the Windows scheduler getting in the way of a good benchmark. But just looking at the worst one (fake_commands/2000), it does appear to be pretty consistently bad:

It appears that the added branch is being falsely predicted, and the check does add an increase in execution time. If anyone else can replicate this behaviour, I'll mark this as Draft and work on an alternate solution that avoids any changes on the hot path.

If anyone has some more specific benchmarking instructions I could follow I'd really appreciate it. I'm still fairly new to Criterion, and this is my first time benchmarking Bevy as well.

bushrat011899 · 2023-11-26T13:21:43Z

Looking into the fake_commands and sized_commands benchmarks, they are both effectively no-op commands, so it makes sense that they'd show the largest relative change in performance in the command queue clearing process. Since the other commands actually perform work on the World, they barely register a change in performance, since the command queue is already such a small component of the performance bottleneck.

bushrat011899 · 2023-11-26T21:44:01Z

Running just the fake_commands benches on my Intel laptop this morning, the performance difference appears far closer to acceptable in my opinion. I'll have another look on my AMD machine tonight after cleaning up a few things to see if it's more in-line with these measurements.

Bench Output

> cargo bench --bench ecs -- fake_commands --load-baseline impldrop --baseline main
    Finished bench [optimized] target(s) in 0.09s
     Running benches/bevy_ecs/benches.rs (target\release\deps\ecs-cc899ba4903ad805.exe)
Gnuplot not found, using plotters backend
fake_commands/2000_commands
                        time:   [3.9578 µs 3.9911 µs 4.0243 µs]
                        change: [+5.8616% +7.3377% +8.8263%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  8 (8.00%) low mild
  4 (4.00%) high mild
  11 (11.00%) high severe
fake_commands/4000_commands
                        time:   [8.2784 µs 8.3384 µs 8.3945 µs]
                        change: [+1.7114% +3.5701% +5.2884%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
fake_commands/6000_commands
                        time:   [14.614 µs 14.835 µs 15.037 µs]
                        change: [-3.9313% -1.3772% +1.1835%] (p = 0.30 > 0.05)
                        No change in performance detected.
fake_commands/8000_commands
                        time:   [19.141 µs 19.408 µs 19.686 µs]
                        change: [-7.0809% -4.7707% -2.5649%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

fake_commands/2000

fake_commands/8000

If this is still an unacceptable level of regression, one alternative is to increase the size of CommandMeta and include a dedicated drop function pointer. This would add 8 bytes (usize) to every Command stored in the CommandQueue however, which is why I initially avoided that solution.

bushrat011899 · 2023-11-26T22:48:39Z

Ok adding an extra function pointer to CommandMeta is an absolute no-go for the fake_commands benchmark.

A near 100% regression

> cargo bench --bench ecs -- fake_commands --load-baseline impldrop --baseline main
    Finished bench [optimized] target(s) in 0.10s
Gnuplot not found, using plotters backend
fake_commands/2000_commands
                        time:   [7.3787 µs 7.8152 µs 8.2519 µs]
                        change: [+89.014% +97.409% +106.18%] (p = 0.00 < 0.05)
                        Performance has regressed.
fake_commands/4000_commands
                        time:   [15.151 µs 15.906 µs 16.725 µs]
                        change: [+80.068% +88.055% +96.419%] (p = 0.00 < 0.05)
                        Performance has regressed.
fake_commands/6000_commands
                        time:   [21.302 µs 22.041 µs 22.863 µs]
                        change: [+51.893% +58.609% +65.993%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
fake_commands/8000_commands
                        time:   [29.114 µs 30.216 µs 31.431 µs]
                        change: [+53.369% +59.470% +64.795%] (p = 0.00 < 0.05)
                        Performance has regressed.

It makes sense why adding it tanks performance, since most of that benchmark is just reading and writing the CommandMeta to the byte array, so doubling the bytes being shuffled around (fake_command is a ZST) should pretty drastically reduce performance.

james7132

Thanks for running the benchmarks in both cases, really helps cover our bases with the options for fixing this bug. LGTM.

…re possible Also removed one branch from `apply`. Revert "Updated `CommandQueue` Changes to avoid `&mut Option<&mut World>` where possible"

bushrat011899 · 2023-11-27T22:20:22Z

crates/bevy_ecs/src/system/commands/command_queue.rs

@@ -105,13 +105,14 @@ impl CommandQueue {
        // flush the previously queued entities
        world.flush();

-        self.apply_maybe_world(Some(world));
+        self.apply_or_drop(Some(world));


Fair enough, that does read a lot better!

cart

Looks good. I renamed apply_maybe_world to apply_or_drop for clarity and made the docs a bit more explicit. The old named seemed dangerous as callers of apply_maybe_world might assume they can pass None with no side effects.

cart · 2023-11-27T22:21:53Z

Renamed again to apply_or_drop_queued for maximum clarity.

# Objective - Fixes #10676, preventing a possible memory leak for commands which owned resources. ## Solution Implemented `Drop` for `CommandQueue`. This has been done entirely in the private API of `CommandQueue`, ensuring no breaking changes. Also added a unit test, `test_command_queue_inner_drop_early`, based on the reproduction steps as outlined in #10676. ## Notes I believe this can be applied to `0.12.1` as well, but I am uncertain of the process to make that kind of change. Please let me know if there's anything I can do to help with the back-porting of this change. --------- Co-authored-by: Carter Anderson <[email protected]>

Implemented Drop for CommandQueue

cd4257c

bushrat011899 commented Nov 26, 2023

View reviewed changes

ItsDoot added C-Bug An unexpected or incorrect behavior A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Nov 26, 2023

james7132 added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Nov 26, 2023

alice-i-cecile added this to the 0.12.1 milestone Nov 26, 2023

james7132 removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Nov 26, 2023

james7132 approved these changes Nov 26, 2023

View reviewed changes

james7132 requested a review from maniwani November 26, 2023 23:05

Updated CommandQueue Changes to avoid &mut Option<&mut World> whe…

74de7b2

…re possible Also removed one branch from `apply`. Revert "Updated `CommandQueue` Changes to avoid `&mut Option<&mut World>` where possible"

bushrat011899 force-pushed the ImplementDropForCommandQueue branch from baa628f to 74de7b2 Compare November 26, 2023 23:07

alice-i-cecile approved these changes Nov 27, 2023

View reviewed changes

alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Nov 27, 2023

JMS55 approved these changes Nov 27, 2023

View reviewed changes

Rename to apply_or_drop

6814232

bushrat011899 commented Nov 27, 2023

View reviewed changes

cart approved these changes Nov 27, 2023

View reviewed changes

Even more clarity

2484639

cart enabled auto-merge November 27, 2023 22:22

cart added this pull request to the merge queue Nov 27, 2023

Merged via the queue into bevyengine:main with commit 6e871ab Nov 27, 2023
22 checks passed

cart mentioned this pull request Nov 30, 2023

Remove a ptr-to-int cast in CommandQueue::apply #10475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `Drop` for `CommandQueue` #10746

Implement `Drop` for `CommandQueue` #10746

bushrat011899 commented Nov 26, 2023

bushrat011899 left a comment

bushrat011899 Nov 26, 2023

bushrat011899 Nov 26, 2023

SkiFire13 Nov 26, 2023

bushrat011899 Nov 26, 2023

bushrat011899 Nov 26, 2023

bushrat011899 Nov 26, 2023

james7132 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

james7132 left a comment

bushrat011899 Nov 27, 2023

cart left a comment

cart commented Nov 27, 2023

Implement Drop for CommandQueue #10746

Implement Drop for CommandQueue #10746

Conversation

bushrat011899 commented Nov 26, 2023

Objective

Solution

Notes

bushrat011899 left a comment

Choose a reason for hiding this comment

bushrat011899 Nov 26, 2023

Choose a reason for hiding this comment

bushrat011899 Nov 26, 2023

Choose a reason for hiding this comment

SkiFire13 Nov 26, 2023

Choose a reason for hiding this comment

bushrat011899 Nov 26, 2023

Choose a reason for hiding this comment

bushrat011899 Nov 26, 2023

Choose a reason for hiding this comment

bushrat011899 Nov 26, 2023

Choose a reason for hiding this comment

james7132 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

bushrat011899 commented Nov 26, 2023

james7132 left a comment

Choose a reason for hiding this comment

bushrat011899 Nov 27, 2023

Choose a reason for hiding this comment

cart left a comment

Choose a reason for hiding this comment

cart commented Nov 27, 2023

Implement `Drop` for `CommandQueue` #10746

Implement `Drop` for `CommandQueue` #10746