[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

flar · 2024-02-24T01:27:50Z

Previously the DisplayListBuilder would only pass along bounds for a saveLayer when they were supplied by the caller that was building the DisplayList. This would require Impeller to use post-processing of the EntityPass lists to compute them on its own.

DisplayList can now compute those bounds as it builds the DisplayList to save dispatch clients from having to do so on their own. It will also provide an indicator in the case when the caller supplied bounds that ended up being too small to capture all of the content, causing clipping by the layer render target.

flar · 2024-02-24T01:53:12Z

Some preliminary data from the DLBuilder benchmark shows a big impact when the bounds are not provided. Note that almost all operations are within 1-2% of the original speed, but the 4 "WithSaveLayer" benchmarks are much slower. Examining the code shows that the SaveLayer operations being tested there are not supplying bounds.

So, the minimal performance impact demonstrates that the bounds computation only comes into play for a SaveLayer without bounds, but it does have an impact when the bounds need to be provided. One complication is that while the DisplayListBuilder is already computing the bounds of the DL operations, it is doing so in the "output space" of the DisplayList. When computing the bounds for a given SaveLayer, though, it needs to compute them in the transform space of the SaveLayer which may not match the transform of the entire DL.

Performance impact of pre-computing SaveLayer bounds

At some point since that solution was prototyped, it became clear that the DL needed to also vet the bounds supplied by the caller so that Impeller could trust them - which means the overhead is now for all SaveLayer calls, even if the caller supplied some bounds. Now the bounds delivered to a DlOpReceiver can be fully trusted to be reasonably tight and will cover the contents unless the "clipped" flag is set.

Here is an expanded benchmark run on the previous and latest version of the fix with a new benchmark added to measure the impact on each rendering op inside the SaveLayer (the GlobalSaveLayer variants of the benchmarks).

Performance impact of previous solution and final solution on pre-computing SaveLayer bounds

The latest solution has slightly more overhead for each SaveLayer/Restore pair, but much less overhead on a per-rendering-op basis for the contents of those SL/Restore calls.

flar · 2024-02-28T07:10:57Z

In looking through the Impeller code to see what happens when all SaveLayer calls now come through with a bounds, I've found the following 2 cases where it seems to matter:

engine/impeller/aiks/paint_pass_delegate.cc

Line 75 in 632f9d7

if (entity_pass->GetBoundsLimit().has_value()) {

The opacity peephole code looks like it won't even try to collapse passes if they have an active bounds limit.

engine/impeller/entity/entity_pass.cc

Line 221 in 632f9d7

if (!subpass.bounds_limit_.has_value()) {

Entity passes will still go through their children looking for content bounds. They used to treat the saveLayer bounds limit as a clip, but did not trust it for purposes of GetSubpassCoverage() so the new behavior will just cause more work to be done here (an intersection operation).

jonahwilliams · 2024-02-28T16:53:49Z

I think we'd need to distinguish the case of explicit bounds being set vs the DL computing bounds. For 1. We can't apply the peephole if there is a user-specified bounds limit since I believe it can change the expected rendering. Same for 2, we need to know that the DL computing the bounds limit so we can trust it.

flar · 2024-02-28T17:44:58Z

I think we'd need to distinguish the case of explicit bounds being set vs the DL computing bounds.

The change does provide that information, but we'd need to record it somewhere.

For 1. We can't apply the peephole if there is a user-specified bounds limit since I believe it can change the expected rendering.

DL has similar logic in it when it dispatches to Skia. I think the intent is that saveLayer doesn't specifically clip and so eliminating it would lose some behavior. Although, saveLayer does implicitly clip as it generates the temporary render target for the layer. The layer could be replaced with a clip and the final outcome could be a net savings, but the behavior would be affected in very minor ways (mainly that if we are in a rotated coordinate system you can tell whether the layer was allocated to the rotated bounds or whether a clip was directly enforced).

DL could provide a couple of other hints as well:

Did the supplied bounds encompass the computed bounds? Unfortunately this would require the extra overhead of computing the bounds even in cases where the caller supplied them. But, if this condition is true, then the opacity optimization could be applied even though the caller supplied the layer bounds.
The DL already provides an opacity compatibility hint in the SaveLayerOptions, but I don't think it takes the bounds into account as mentioned above. Eventually we want to expand the role of that hint, though, so the need for Impeller to compute that condition will go away in time.
In either case, the suspicion of the bounds could be handled by a clip operation in the opacity peephole code.

Same for 2, we need to know that the DL computing the bounds limit so we can trust it.

Trust it for what? If it is too large then that is on the caller for having a bad estimate. If it is too small then Impeller clips to it anyway so the bounds accumulation was wasted. Do we have any evidence that a caller supplying overly large bounds for their saveLayer calls is a problem that requires Impeller to do its own backup computations to optimize?

jonahwilliams · 2024-03-09T00:29:00Z

This is on my plate, so far so good :)

I do wonder if we can/should make the save layer bounds size/promise required so that we could delete all of the bounds computation code in aiks, though that probably should be a part of a different patch.

flar · 2024-03-09T01:08:30Z

This is on my plate, so far so good :)

I do wonder if we can/should make the save layer bounds size/promise required so that we could delete all of the bounds computation code in aiks, though that probably should be a part of a different patch.

The only reason I left it in was that unit tests still call SaveLayer without "promised" bounds - and in fact there is a unit test that wants to make sure that the resulting entities are properly computed. I could get rid of the tests, but do we really want to put this burden on internal code that uses aiks?

bdero · 2024-03-09T05:19:58Z

I'm all for flipping precomputed bounds on for Flutter now, but I don't think we should get into the business of actively degrading the Aiks interface by removing functionality or tests yet. At least not until DisplayList is in a state where we can get AAOS switched over to it.

That being said, I suspect DL isn't that far off from this. Maybe we can discuss what would be required for this in detail sometime next week? Basically we'd just need to be able to build the DisplayList/builders with no Skia dependencies.

So for stuff like text we'd need a DlTextBlob with Impeller and Skia implementations like we have for DlImage. And for stuff like DlImage, we can remove the Impeller/Skia-specific vtable methods and rely on contextual downcasting to the Skia/Impeller-specific DlImage classes instead.
Contextual downcasting of the generic HAL types is how we avoid needing to compile all of Impeller's graphics API backends against every Flutter platform target (which would be impossible, of course).

jonahwilliams · 2024-03-11T17:38:14Z

We don't need to remove it now, but we do need to plan to remove it as long term having two quasi public APIs for Impeller isn't great. Lets discuss it during the weekly?

jonahwilliams · 2024-03-11T19:12:03Z

display_list/dl_builder.h

    bool cannot_inherit_opacity_ = false;
    bool has_compatible_op_ = false;
    std::shared_ptr<const DlImageFilter> filter_;
    bool is_unbounded_ = false;
    bool has_deferred_save_op_ = false;
    bool is_nop_ = false;
    bool affects_transparent_layer_ = false;
+    std::shared_ptr<BoundsAccumulator> layer_accumulator_;


Can we avoid heap allocating another structure here? Could this be a struct on the DLBuilder?

For this specific object, it can be null for Save entries, it will be non-null for SaveLayer entries (has_layer) and for Save entries inside a SaveLayer, they need to forward the existing accumulator, so it needs to be shareable. I could do some finagling to try to do some forwarding without having to do a shared pointer, but how much would it save?

There are already a lot of heap allocated structures created during recording/replay of the dl and layer trees, for example all of the LSS allocations show up in profiles. Though the overall time is small, it will accumulate and I'm worried that will eventually become too difficult to fix without a large refactor.

I've been accumulating ideas for simplification of a lot of this while doing this PR. The changes don't directly contribute towards eliminating the complexity/existance of the entity structures, but they will be important steps going forward. These would include:

MatrixClipTracker (DlMCT) has 2 variants for SkMatrix and SkM44 for faster performance for 99% of use cases. I'd like to move to just one transform that is both capable of a full 4x4 transform and has optimizations for the common cases. Actually, just simplifying to SkM44 may be enough in the short term because using it with asM33() to do bounds transforms in order to leverage the optimizations in SkMatrix has a reasonable overhead compared to the performance of impeller::Matrix which does a pessimal operation every time it does a bounds transform. But, moving to impeller::Matrix would be more forward-thinking in terms of eventually weaning the DL off of Skia sources. That's an ongoing background investigation at this point. It's half of the reason why I want to create an impeller::Transform which wraps an impeller::Matrix and tracks the state for optimizations. The state tracking only has minor improvement for things like scale/translate/transform on the matrix itself, but the bounds transform operation using Matrix is way slower than what SkMatrix does.

If DlMCT is greatly simplified, then it can be split into a singular MatrixClipObject which just stores the matrix and clip and does not provide stack ops, and DlMCT becomes simply a stack of MCOs. Then I can absorb that new object into the existing stack data within DLBuilder and reduce the complexity of needing multiple objects for Save/Restore operations and inline their allocations. DlMCT would still be available for other code uses that don't already have their own stack structures.

LSS could be re-architected into something stack allocated during Preroll, it would take some work, and not be as "one stop" as the existing code. I originally had something like that, but the complexities of maintaining the linkage between stack allocated structures became tricky for the *Layer classes to manage and hiding it inside a single object with save/restore semantics was simpler to use in the Layers. If it is really important, though, I could move back to a stack allocated variant. The complexities are that LSS supports dynamically changing its "target" (i.e. delegate) on the fly and also to dynamically produce a set of Mutation objects for the Platform Views - all of which involve resolving/replaying the list of outstanding states. Sure, I can have them back-chain through the stack-allocated objects, but it gets tricky.

This seems like a reasonable strategy, and moving towards impeller::Matrix long term is a good idea. In general though, I'm skeptical that we'll get much juice out of differentiating between affine and 4x4 matrices - keeping one path might be simpler. And if necessary, more amenable to SIMD accelleration instead.

flow/raster_cache_util.cc

jonahwilliams · 2024-03-11T19:13:30Z

impeller/entity/entity_pass.cc

 }

 std::optional<Rect> EntityPass::GetBoundsLimit() const {
  return bounds_limit_;
 }

+void EntityPass::SetBoundsMightClipContent(bool clips) {


I have a preference for keeping the state as an enum instead of creating two different bools.

Maybe I can reuse the same enum that Aiks Canvas uses.

Consolidated Canvas and EntityPass into a single ContentBoundsPromise enum for consistency in the last commit.

display_list/effects/dl_image_filter.h

flar · 2024-03-11T22:41:12Z

Waiting for the pre-cursor PRs to land so I can rebase this on top of their changes.

jonahwilliams

LGTM!

…utter/engine#50935)

…145226) flutter/engine@d107f65...c2fd533 2024-03-15 [email protected] [canvaskit] Fix backdrop filter with platform views as children (flutter/engine#51442) 2024-03-15 [email protected] Roll Skia from 54ecc23acf31 to 11b1a958c2ac (1 revision) (flutter/engine#51440) 2024-03-15 [email protected] [DisplayList] Pre-Compute saveLayer bounds in DLBuilder (flutter/engine#50935) 2024-03-15 [email protected] Roll Fuchsia Linux SDK from da-siZ7wPDA0Z0wXJ... to nTth7xxzAfENAJjMH... (flutter/engine#51446) Also rolling transitive DEPS: fuchsia/sdk/core/linux-amd64 from da-siZ7wPDA0 to nTth7xxzAfEN If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

github-actions bot added the e: impeller label Feb 24, 2024

flar force-pushed the display-list-compute-save-layer-bounds branch 2 times, most recently from 9e80ae7 to 0252dc3 Compare March 4, 2024 07:51

flar force-pushed the display-list-compute-save-layer-bounds branch from 0252dc3 to f5f6cf7 Compare March 8, 2024 22:36

flar changed the title ~~[DisplayList] Compute saveLayer bounds in DLBuilder when not provided~~ [DisplayList] Pre-Compute saveLayer bounds in DLBuilder Mar 8, 2024

flar requested review from bdero and jonahwilliams March 8, 2024 23:32

flar force-pushed the display-list-compute-save-layer-bounds branch from f5f6cf7 to 8e41511 Compare March 9, 2024 00:19

jonahwilliams reviewed Mar 11, 2024

View reviewed changes

flar commented Mar 11, 2024

View reviewed changes

display_list/effects/dl_image_filter.h Outdated Show resolved Hide resolved

This was referenced Mar 11, 2024

Avoid unnecessary transform resets for pixel snapping #51337

Merged

Fix null filter NOP case in DlLocalMatrixImageFilter #51340

Merged

[DisplayList] Compute saveLayer bounds in DLBuilder when not provided

f2ee790

flar force-pushed the display-list-compute-save-layer-bounds branch from 8e41511 to f2ee790 Compare March 14, 2024 19:10

consolidate EntityPass and Canvas into a common bounds state enum

c0d3361

flar requested a review from jonahwilliams March 14, 2024 20:54

jonahwilliams approved these changes Mar 14, 2024

View reviewed changes

flar added the autosubmit Merge PR when tree becomes green via auto submit App label Mar 15, 2024

auto-submit bot merged commit 4b8218d into flutter:main Mar 15, 2024
26 checks passed

engine-flutter-autoroll mentioned this pull request Mar 15, 2024

Roll Flutter Engine from d107f65a8ee6 to c2fd5333a5d2 (4 revisions) flutter/flutter#145226

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2024

4b8218d66 [DisplayList] Pre-Compute saveLayer bounds in DLBuilder (fl…

de13bba

…utter/engine#50935)

flar mentioned this pull request Mar 26, 2024

Mark SaveLayer bounds computed by the layer tree as "trusted" flutter/flutter#145775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

flar commented Feb 24, 2024 •

edited

Loading

flar commented Feb 24, 2024 •

edited

Loading

flar commented Feb 28, 2024

jonahwilliams commented Feb 28, 2024

flar commented Feb 28, 2024 •

edited

Loading

jonahwilliams commented Mar 9, 2024

flar commented Mar 9, 2024

bdero commented Mar 9, 2024

jonahwilliams commented Mar 11, 2024

jonahwilliams Mar 11, 2024

flar Mar 11, 2024

jonahwilliams Mar 11, 2024

flar Mar 11, 2024

jonahwilliams Mar 11, 2024

jonahwilliams Mar 11, 2024

flar Mar 11, 2024

flar Mar 14, 2024

flar commented Mar 11, 2024

jonahwilliams left a comment

[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

Conversation

flar commented Feb 24, 2024 • edited Loading

flar commented Feb 24, 2024 • edited Loading

flar commented Feb 28, 2024

jonahwilliams commented Feb 28, 2024

flar commented Feb 28, 2024 • edited Loading

jonahwilliams commented Mar 9, 2024

flar commented Mar 9, 2024

bdero commented Mar 9, 2024

jonahwilliams commented Mar 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flar commented Mar 11, 2024

jonahwilliams left a comment

Choose a reason for hiding this comment

flar commented Feb 24, 2024 •

edited

Loading

flar commented Feb 24, 2024 •

edited

Loading

flar commented Feb 28, 2024 •

edited

Loading