-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935
[DisplayList] Pre-Compute saveLayer bounds in DLBuilder #50935
Conversation
Some preliminary data from the DLBuilder benchmark shows a big impact when the bounds are not provided. Note that almost all operations are within 1-2% of the original speed, but the 4 "WithSaveLayer" benchmarks are much slower. Examining the code shows that the SaveLayer operations being tested there are not supplying bounds. So, the minimal performance impact demonstrates that the bounds computation only comes into play for a SaveLayer without bounds, but it does have an impact when the bounds need to be provided. One complication is that while the DisplayListBuilder is already computing the bounds of the DL operations, it is doing so in the "output space" of the DisplayList. When computing the bounds for a given SaveLayer, though, it needs to compute them in the transform space of the SaveLayer which may not match the transform of the entire DL. At some point since that solution was prototyped, it became clear that the DL needed to also vet the bounds supplied by the caller so that Impeller could trust them - which means the overhead is now for all SaveLayer calls, even if the caller supplied some bounds. Now the bounds delivered to a DlOpReceiver can be fully trusted to be reasonably tight and will cover the contents unless the "clipped" flag is set. Here is an expanded benchmark run on the previous and latest version of the fix with a new benchmark added to measure the impact on each rendering op inside the SaveLayer (the The latest solution has slightly more overhead for each SaveLayer/Restore pair, but much less overhead on a per-rendering-op basis for the contents of those SL/Restore calls. |
In looking through the Impeller code to see what happens when all SaveLayer calls now come through with a bounds, I've found the following 2 cases where it seems to matter:
The opacity peephole code looks like it won't even try to collapse passes if they have an active bounds limit. engine/impeller/entity/entity_pass.cc Line 221 in 632f9d7
Entity passes will still go through their children looking for content bounds. They used to treat the saveLayer bounds limit as a clip, but did not trust it for purposes of GetSubpassCoverage() so the new behavior will just cause more work to be done here (an intersection operation).
|
I think we'd need to distinguish the case of explicit bounds being set vs the DL computing bounds. For 1. We can't apply the peephole if there is a user-specified bounds limit since I believe it can change the expected rendering. Same for 2, we need to know that the DL computing the bounds limit so we can trust it. |
The change does provide that information, but we'd need to record it somewhere.
DL has similar logic in it when it dispatches to Skia. I think the intent is that saveLayer doesn't specifically clip and so eliminating it would lose some behavior. Although, saveLayer does implicitly clip as it generates the temporary render target for the layer. The layer could be replaced with a clip and the final outcome could be a net savings, but the behavior would be affected in very minor ways (mainly that if we are in a rotated coordinate system you can tell whether the layer was allocated to the rotated bounds or whether a clip was directly enforced). DL could provide a couple of other hints as well:
Trust it for what? If it is too large then that is on the caller for having a bad estimate. If it is too small then Impeller clips to it anyway so the bounds accumulation was wasted. Do we have any evidence that a caller supplying overly large bounds for their saveLayer calls is a problem that requires Impeller to do its own backup computations to optimize? |
9e80ae7
to
0252dc3
Compare
0252dc3
to
f5f6cf7
Compare
f5f6cf7
to
8e41511
Compare
This is on my plate, so far so good :) I do wonder if we can/should make the save layer bounds size/promise required so that we could delete all of the bounds computation code in aiks, though that probably should be a part of a different patch. |
The only reason I left it in was that unit tests still call SaveLayer without "promised" bounds - and in fact there is a unit test that wants to make sure that the resulting entities are properly computed. I could get rid of the tests, but do we really want to put this burden on internal code that uses aiks? |
I'm all for flipping precomputed bounds on for Flutter now, but I don't think we should get into the business of actively degrading the Aiks interface by removing functionality or tests yet. At least not until DisplayList is in a state where we can get AAOS switched over to it. That being said, I suspect DL isn't that far off from this. Maybe we can discuss what would be required for this in detail sometime next week? Basically we'd just need to be able to build the DisplayList/builders with no Skia dependencies. So for stuff like text we'd need a DlTextBlob with Impeller and Skia implementations like we have for DlImage. And for stuff like DlImage, we can remove the Impeller/Skia-specific vtable methods and rely on contextual downcasting to the Skia/Impeller-specific DlImage classes instead. |
We don't need to remove it now, but we do need to plan to remove it as long term having two quasi public APIs for Impeller isn't great. Lets discuss it during the weekly? |
bool cannot_inherit_opacity_ = false; | ||
bool has_compatible_op_ = false; | ||
std::shared_ptr<const DlImageFilter> filter_; | ||
bool is_unbounded_ = false; | ||
bool has_deferred_save_op_ = false; | ||
bool is_nop_ = false; | ||
bool affects_transparent_layer_ = false; | ||
std::shared_ptr<BoundsAccumulator> layer_accumulator_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid heap allocating another structure here? Could this be a struct on the DLBuilder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific object, it can be null for Save entries, it will be non-null for SaveLayer entries (has_layer) and for Save entries inside a SaveLayer, they need to forward the existing accumulator, so it needs to be shareable. I could do some finagling to try to do some forwarding without having to do a shared pointer, but how much would it save?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already a lot of heap allocated structures created during recording/replay of the dl and layer trees, for example all of the LSS allocations show up in profiles. Though the overall time is small, it will accumulate and I'm worried that will eventually become too difficult to fix without a large refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been accumulating ideas for simplification of a lot of this while doing this PR. The changes don't directly contribute towards eliminating the complexity/existance of the entity structures, but they will be important steps going forward. These would include:
- MatrixClipTracker (DlMCT) has 2 variants for SkMatrix and SkM44 for faster performance for 99% of use cases. I'd like to move to just one transform that is both capable of a full 4x4 transform and has optimizations for the common cases. Actually, just simplifying to SkM44 may be enough in the short term because using it with asM33() to do bounds transforms in order to leverage the optimizations in SkMatrix has a reasonable overhead compared to the performance of impeller::Matrix which does a pessimal operation every time it does a bounds transform. But, moving to impeller::Matrix would be more forward-thinking in terms of eventually weaning the DL off of Skia sources. That's an ongoing background investigation at this point. It's half of the reason why I want to create an impeller::Transform which wraps an impeller::Matrix and tracks the state for optimizations. The state tracking only has minor improvement for things like scale/translate/transform on the matrix itself, but the bounds transform operation using Matrix is way slower than what SkMatrix does.
- If DlMCT is greatly simplified, then it can be split into a singular MatrixClipObject which just stores the matrix and clip and does not provide stack ops, and DlMCT becomes simply a stack of MCOs. Then I can absorb that new object into the existing stack data within DLBuilder and reduce the complexity of needing multiple objects for Save/Restore operations and inline their allocations. DlMCT would still be available for other code uses that don't already have their own stack structures.
- LSS could be re-architected into something stack allocated during Preroll, it would take some work, and not be as "one stop" as the existing code. I originally had something like that, but the complexities of maintaining the linkage between stack allocated structures became tricky for the *Layer classes to manage and hiding it inside a single object with save/restore semantics was simpler to use in the Layers. If it is really important, though, I could move back to a stack allocated variant. The complexities are that LSS supports dynamically changing its "target" (i.e. delegate) on the fly and also to dynamically produce a set of Mutation objects for the Platform Views - all of which involve resolving/replaying the list of outstanding states. Sure, I can have them back-chain through the stack-allocated objects, but it gets tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable strategy, and moving towards impeller::Matrix long term is a good idea. In general though, I'm skeptical that we'll get much juice out of differentiating between affine and 4x4 matrices - keeping one path might be simpler. And if necessary, more amenable to SIMD accelleration instead.
impeller/entity/entity_pass.cc
Outdated
} | ||
|
||
std::optional<Rect> EntityPass::GetBoundsLimit() const { | ||
return bounds_limit_; | ||
} | ||
|
||
void EntityPass::SetBoundsMightClipContent(bool clips) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a preference for keeping the state as an enum instead of creating two different bools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can reuse the same enum that Aiks Canvas uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated Canvas and EntityPass into a single ContentBoundsPromise
enum for consistency in the last commit.
Waiting for the pre-cursor PRs to land so I can rebase this on top of their changes. |
8e41511
to
f2ee790
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…145226) flutter/engine@d107f65...c2fd533 2024-03-15 [email protected] [canvaskit] Fix backdrop filter with platform views as children (flutter/engine#51442) 2024-03-15 [email protected] Roll Skia from 54ecc23acf31 to 11b1a958c2ac (1 revision) (flutter/engine#51440) 2024-03-15 [email protected] [DisplayList] Pre-Compute saveLayer bounds in DLBuilder (flutter/engine#50935) 2024-03-15 [email protected] Roll Fuchsia Linux SDK from da-siZ7wPDA0Z0wXJ... to nTth7xxzAfENAJjMH... (flutter/engine#51446) Also rolling transitive DEPS: fuchsia/sdk/core/linux-amd64 from da-siZ7wPDA0 to nTth7xxzAfEN If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Previously the DisplayListBuilder would only pass along bounds for a saveLayer when they were supplied by the caller that was building the DisplayList. This would require Impeller to use post-processing of the EntityPass lists to compute them on its own.
DisplayList can now compute those bounds as it builds the DisplayList to save dispatch clients from having to do so on their own. It will also provide an indicator in the case when the caller supplied bounds that ended up being too small to capture all of the content, causing clipping by the layer render target.