Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

flar
Copy link
Contributor

@flar flar commented Feb 24, 2024

Previously the DisplayListBuilder would only pass along bounds for a saveLayer when they were supplied by the caller that was building the DisplayList. This would require Impeller to use post-processing of the EntityPass lists to compute them on its own.

DisplayList can now compute those bounds as it builds the DisplayList to save dispatch clients from having to do so on their own. It will also provide an indicator in the case when the caller supplied bounds that ended up being too small to capture all of the content, causing clipping by the layer render target.

@flar
Copy link
Contributor Author

flar commented Feb 24, 2024

Some preliminary data from the DLBuilder benchmark shows a big impact when the bounds are not provided. Note that almost all operations are within 1-2% of the original speed, but the 4 "WithSaveLayer" benchmarks are much slower. Examining the code shows that the SaveLayer operations being tested there are not supplying bounds.

So, the minimal performance impact demonstrates that the bounds computation only comes into play for a SaveLayer without bounds, but it does have an impact when the bounds need to be provided. One complication is that while the DisplayListBuilder is already computing the bounds of the DL operations, it is doing so in the "output space" of the DisplayList. When computing the bounds for a given SaveLayer, though, it needs to compute them in the transform space of the SaveLayer which may not match the transform of the entire DL.

Performance impact of pre-computing SaveLayer bounds

Performance impact of pre-computing saveLayer bounds in DLBuilder

At some point since that solution was prototyped, it became clear that the DL needed to also vet the bounds supplied by the caller so that Impeller could trust them - which means the overhead is now for all SaveLayer calls, even if the caller supplied some bounds. Now the bounds delivered to a DlOpReceiver can be fully trusted to be reasonably tight and will cover the contents unless the "clipped" flag is set.

Here is an expanded benchmark run on the previous and latest version of the fix with a new benchmark added to measure the impact on each rendering op inside the SaveLayer (the GlobalSaveLayer variants of the benchmarks).

Performance impact of previous solution and final solution on pre-computing SaveLayer bounds

DisplayList impact of computing bounds for SaveLayer

The latest solution has slightly more overhead for each SaveLayer/Restore pair, but much less overhead on a per-rendering-op basis for the contents of those SL/Restore calls.

@flar
Copy link
Contributor Author

flar commented Feb 28, 2024

In looking through the Impeller code to see what happens when all SaveLayer calls now come through with a bounds, I've found the following 2 cases where it seems to matter:

if (entity_pass->GetBoundsLimit().has_value()) {

The opacity peephole code looks like it won't even try to collapse passes if they have an active bounds limit.

if (!subpass.bounds_limit_.has_value()) {

Entity passes will still go through their children looking for content bounds. They used to treat the saveLayer bounds limit as a clip, but did not trust it for purposes of GetSubpassCoverage() so the new behavior will just cause more work to be done here (an intersection operation).

@jonahwilliams
Copy link
Member

I think we'd need to distinguish the case of explicit bounds being set vs the DL computing bounds. For 1. We can't apply the peephole if there is a user-specified bounds limit since I believe it can change the expected rendering. Same for 2, we need to know that the DL computing the bounds limit so we can trust it.

@flar
Copy link
Contributor Author

flar commented Feb 28, 2024

I think we'd need to distinguish the case of explicit bounds being set vs the DL computing bounds.

The change does provide that information, but we'd need to record it somewhere.

For 1. We can't apply the peephole if there is a user-specified bounds limit since I believe it can change the expected rendering.

DL has similar logic in it when it dispatches to Skia. I think the intent is that saveLayer doesn't specifically clip and so eliminating it would lose some behavior. Although, saveLayer does implicitly clip as it generates the temporary render target for the layer. The layer could be replaced with a clip and the final outcome could be a net savings, but the behavior would be affected in very minor ways (mainly that if we are in a rotated coordinate system you can tell whether the layer was allocated to the rotated bounds or whether a clip was directly enforced).

DL could provide a couple of other hints as well:

  • Did the supplied bounds encompass the computed bounds? Unfortunately this would require the extra overhead of computing the bounds even in cases where the caller supplied them. But, if this condition is true, then the opacity optimization could be applied even though the caller supplied the layer bounds.
  • The DL already provides an opacity compatibility hint in the SaveLayerOptions, but I don't think it takes the bounds into account as mentioned above. Eventually we want to expand the role of that hint, though, so the need for Impeller to compute that condition will go away in time.
  • In either case, the suspicion of the bounds could be handled by a clip operation in the opacity peephole code.

Same for 2, we need to know that the DL computing the bounds limit so we can trust it.

Trust it for what? If it is too large then that is on the caller for having a bad estimate. If it is too small then Impeller clips to it anyway so the bounds accumulation was wasted. Do we have any evidence that a caller supplying overly large bounds for their saveLayer calls is a problem that requires Impeller to do its own backup computations to optimize?

@flar flar force-pushed the display-list-compute-save-layer-bounds branch 2 times, most recently from 9e80ae7 to 0252dc3 Compare March 4, 2024 07:51
@flar flar force-pushed the display-list-compute-save-layer-bounds branch from 0252dc3 to f5f6cf7 Compare March 8, 2024 22:36
@flar flar changed the title [DisplayList] Compute saveLayer bounds in DLBuilder when not provided [DisplayList] Pre-Compute saveLayer bounds in DLBuilder Mar 8, 2024
@flar flar requested review from bdero and jonahwilliams March 8, 2024 23:32
@flar flar force-pushed the display-list-compute-save-layer-bounds branch from f5f6cf7 to 8e41511 Compare March 9, 2024 00:19
@jonahwilliams
Copy link
Member

This is on my plate, so far so good :)

I do wonder if we can/should make the save layer bounds size/promise required so that we could delete all of the bounds computation code in aiks, though that probably should be a part of a different patch.

@flar
Copy link
Contributor Author

flar commented Mar 9, 2024

This is on my plate, so far so good :)

I do wonder if we can/should make the save layer bounds size/promise required so that we could delete all of the bounds computation code in aiks, though that probably should be a part of a different patch.

The only reason I left it in was that unit tests still call SaveLayer without "promised" bounds - and in fact there is a unit test that wants to make sure that the resulting entities are properly computed. I could get rid of the tests, but do we really want to put this burden on internal code that uses aiks?

@bdero
Copy link
Member

bdero commented Mar 9, 2024

I'm all for flipping precomputed bounds on for Flutter now, but I don't think we should get into the business of actively degrading the Aiks interface by removing functionality or tests yet. At least not until DisplayList is in a state where we can get AAOS switched over to it.

That being said, I suspect DL isn't that far off from this. Maybe we can discuss what would be required for this in detail sometime next week? Basically we'd just need to be able to build the DisplayList/builders with no Skia dependencies.

So for stuff like text we'd need a DlTextBlob with Impeller and Skia implementations like we have for DlImage. And for stuff like DlImage, we can remove the Impeller/Skia-specific vtable methods and rely on contextual downcasting to the Skia/Impeller-specific DlImage classes instead.
Contextual downcasting of the generic HAL types is how we avoid needing to compile all of Impeller's graphics API backends against every Flutter platform target (which would be impossible, of course).

@jonahwilliams
Copy link
Member

We don't need to remove it now, but we do need to plan to remove it as long term having two quasi public APIs for Impeller isn't great. Lets discuss it during the weekly?

bool cannot_inherit_opacity_ = false;
bool has_compatible_op_ = false;
std::shared_ptr<const DlImageFilter> filter_;
bool is_unbounded_ = false;
bool has_deferred_save_op_ = false;
bool is_nop_ = false;
bool affects_transparent_layer_ = false;
std::shared_ptr<BoundsAccumulator> layer_accumulator_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid heap allocating another structure here? Could this be a struct on the DLBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific object, it can be null for Save entries, it will be non-null for SaveLayer entries (has_layer) and for Save entries inside a SaveLayer, they need to forward the existing accumulator, so it needs to be shareable. I could do some finagling to try to do some forwarding without having to do a shared pointer, but how much would it save?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already a lot of heap allocated structures created during recording/replay of the dl and layer trees, for example all of the LSS allocations show up in profiles. Though the overall time is small, it will accumulate and I'm worried that will eventually become too difficult to fix without a large refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been accumulating ideas for simplification of a lot of this while doing this PR. The changes don't directly contribute towards eliminating the complexity/existance of the entity structures, but they will be important steps going forward. These would include:

  • MatrixClipTracker (DlMCT) has 2 variants for SkMatrix and SkM44 for faster performance for 99% of use cases. I'd like to move to just one transform that is both capable of a full 4x4 transform and has optimizations for the common cases. Actually, just simplifying to SkM44 may be enough in the short term because using it with asM33() to do bounds transforms in order to leverage the optimizations in SkMatrix has a reasonable overhead compared to the performance of impeller::Matrix which does a pessimal operation every time it does a bounds transform. But, moving to impeller::Matrix would be more forward-thinking in terms of eventually weaning the DL off of Skia sources. That's an ongoing background investigation at this point. It's half of the reason why I want to create an impeller::Transform which wraps an impeller::Matrix and tracks the state for optimizations. The state tracking only has minor improvement for things like scale/translate/transform on the matrix itself, but the bounds transform operation using Matrix is way slower than what SkMatrix does.
  • If DlMCT is greatly simplified, then it can be split into a singular MatrixClipObject which just stores the matrix and clip and does not provide stack ops, and DlMCT becomes simply a stack of MCOs. Then I can absorb that new object into the existing stack data within DLBuilder and reduce the complexity of needing multiple objects for Save/Restore operations and inline their allocations. DlMCT would still be available for other code uses that don't already have their own stack structures.
  • LSS could be re-architected into something stack allocated during Preroll, it would take some work, and not be as "one stop" as the existing code. I originally had something like that, but the complexities of maintaining the linkage between stack allocated structures became tricky for the *Layer classes to manage and hiding it inside a single object with save/restore semantics was simpler to use in the Layers. If it is really important, though, I could move back to a stack allocated variant. The complexities are that LSS supports dynamically changing its "target" (i.e. delegate) on the fly and also to dynamically produce a set of Mutation objects for the Platform Views - all of which involve resolving/replaying the list of outstanding states. Sure, I can have them back-chain through the stack-allocated objects, but it gets tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable strategy, and moving towards impeller::Matrix long term is a good idea. In general though, I'm skeptical that we'll get much juice out of differentiating between affine and 4x4 matrices - keeping one path might be simpler. And if necessary, more amenable to SIMD accelleration instead.

flow/raster_cache_util.cc Outdated Show resolved Hide resolved
}

std::optional<Rect> EntityPass::GetBoundsLimit() const {
return bounds_limit_;
}

void EntityPass::SetBoundsMightClipContent(bool clips) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a preference for keeping the state as an enum instead of creating two different bools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can reuse the same enum that Aiks Canvas uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated Canvas and EntityPass into a single ContentBoundsPromise enum for consistency in the last commit.

@flar
Copy link
Contributor Author

flar commented Mar 11, 2024

Waiting for the pre-cursor PRs to land so I can rebase this on top of their changes.

@flar flar force-pushed the display-list-compute-save-layer-bounds branch from 8e41511 to f2ee790 Compare March 14, 2024 19:10
@flar flar requested a review from jonahwilliams March 14, 2024 20:54
Copy link
Member

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@flar flar added the autosubmit Merge PR when tree becomes green via auto submit App label Mar 15, 2024
@auto-submit auto-submit bot merged commit 4b8218d into flutter:main Mar 15, 2024
26 checks passed
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Mar 15, 2024
…145226)

flutter/engine@d107f65...c2fd533

2024-03-15 [email protected] [canvaskit] Fix backdrop filter with platform views as children (flutter/engine#51442)
2024-03-15 [email protected] Roll Skia from 54ecc23acf31 to 11b1a958c2ac (1 revision) (flutter/engine#51440)
2024-03-15 [email protected] [DisplayList] Pre-Compute saveLayer bounds in DLBuilder (flutter/engine#50935)
2024-03-15 [email protected] Roll Fuchsia Linux SDK from da-siZ7wPDA0Z0wXJ... to nTth7xxzAfENAJjMH... (flutter/engine#51446)

Also rolling transitive DEPS:
  fuchsia/sdk/core/linux-amd64 from da-siZ7wPDA0 to nTth7xxzAfEN

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants