Clipping in piet-gpu #52

raphlinus · 2021-04-22T17:45:23Z

I actually have this one started, but it's been a bit stuck, partly because we don't yet have 100% clarity on two thorny questions. One is the best strategy for implementing the clip stack in fine rasterization in piet-gpu. Should we try to hold a window in registers, or always spill to memory? When we do spill to memory, should that be explicit read/writes to buffers, or relying on "scratch memory" managed by the shader compiler? See linebender/vello#77 and linebender/vello#83 for more discussion on that.

The other thorny issue is that the current implementation relies on computing bounding boxes CPU-side. I'd love to move that entirely to GPU, but it's not easy. linebender/vello#36 has more discussion of this issue. In addition, @eliasnaur has brought up the issue of tighter bboxes. Right now, the bbox for the clip is the union of the bboxes of the contents, but I think it's possible to use the intersection of the clip bbox and that union. Would that actually improve performance? It shouldn't affect fine rasterization, because all tiles outside the clip bbox should get an all 0 alpha mask and thus be optimized in coarse rasterization.

Even though these thorny issues exist, there are fun parts, and some of this works well in piet-gpu today. In particular, the coarse rasterizer is able to do a per-tile optimization based on the coverage of the clip mask in the tile. If it's all 0, then everything (the clip mask itself and the contents) can be skipped. If it's all 1, then the clip mask can be skipped, and the contents can be rendered directly to the rgba buffer. Only when the path intersects with the tile do we need to render the alpha for the clipping. That's represented by a push, the rendering of the contents, and a rendering of the clip mask combined with a pop.

The clip stack is arbitrarily deep. There are actually a bunch of different ways to handle nested clips (with slightly different results because of conflation artifacts), but a large part of the motivation for using a stack with alpha values is that it should generalize quite well to arbitrary blend modes.

I'm not sure whether it's better to present the work-in-progress and talk about the future work to refine it, or hold it until it's a little more baked.

eliasnaur · 2021-04-23T07:56:49Z

The other thorny issue is that the current implementation relies on computing bounding boxes CPU-side. I'd love to move that entirely to GPU, but it's not easy. linebender/piet-gpu#36 has more discussion of this issue. In addition, @eliasnaur has brought up the issue of tighter bboxes. Right now, the bbox for the clip is the union of the bboxes of the contents, but I think it's possible to use the intersection of the clip bbox and that union. Would that actually improve performance? It shouldn't affect fine rasterization, because all tiles outside the clip bbox should get an all 0 alpha mask and thus be optimized in coarse rasterization.

I agree that performance is probably not much improved with intersecting bounding boxes because coarse.comp deals with trivial clipping tiles.

However, I'm still interested in intersections because they seem to match element.comp's monoid better. In particular, intersecting bounding boxes propagate towards the end of the monoid expression, whereas computing the union propagate backwards. I haven't actually implemented intersections so I could very well be wrong in some crucial detail.

raphlinus · 2021-04-23T14:34:19Z

I think maybe I'm not understanding what your proposal is, and on further thought, intersection of clip bbox and the union of the contents isn't going to work either. The fundamental problem with that is that an empty clip (clip where the entire tile has an alpha of 0) is not the same as no clip. To get the effect of the empty clip, then in the input to the coarse rasterization step (binning and calculation of which tiles are affected), the begin and end commands have to be visible to the coarse rasterizer logic. Then it will output nothing for those tiles.

If the bbox does not include those tiles, then it will not see the empty clip, and will output the content, which is incorrect.

But it's entirely possible I'm missing something here, and there's a cleaner way to do this I'm just not seeing.

eliasnaur · 2021-04-23T16:18:55Z

I propose to intersect the fills as well. Consider the scene

Line(0,0),BeginClip,Line,Line,Line,FillColor,EndClip

where an effectively empty clip is applied to a colored triangle.

In the current pipeline, the bounding boxes for BeginClip and EndClip need to be (1) equal and (2) contain the fill.

With my proposal, the fill and path segment will have their bounding boxes intersected with the empty clip, so all scene commands, the BeginClip, the FillColor, the EndClip as well as the Lines all end up with empty bounding boxes.

What I don't know is whether the rest of the pipeline can tolerate path segments that are outside their bounding box.

raphlinus · 2021-05-01T05:25:37Z

It took me a couple tries (sometimes I'm dense), but I think I understand your proposal.

Assume for a second that element processing is sequential. Then your proposal is fine. The element state contains a stack of clip bboxes. A BeginClip pushes the current clip bbox, and intersects the clip's bbox with the clip bbox in the state. An EndClip pops the stack. All drawing elements (fill etc) have their bbox intersected with the clip. The rest of the pipeline should definitely be able to tolerate path segments outside the bbox, because this already happens when the path exceeds the viewport.

Here's the problem though: the monoid needs the stack. That's possible (using the stack monoid ideas), but tricky and with possible performance implications.

In the current state of things, there's a bit of a cost in coarse rasterization to consider the larger (union) bbox, but it's optimized out there and it doesn't make it to fine rasterization.

One thing that's clearer to me now, though. If element processing gains the ability to access a stack rather than a fixed-size bit of element state, then it is better to switch the logic to intersections, rather than have the gpu faithfully emulate the cpu-side union processing that's done now.

raphlinus · 2021-11-08T15:14:00Z

Update: I finally came up with what I think is a good architecture, which I've written up in linebender/vello#119. I hope to implement that in the next week or two, then I'll definitely want to do a writeup.

raphlinus · 2022-02-07T17:07:53Z

More evidence I'm very bad at time estimation. Most of the new element processing pipeline got written, but extending it to clips slipped. Meanwhile, I did a very deep dive into the stack monoid, and that became a paper submission (#55). I'm in the middle of implementing that now, and also, concurrently, the blog post. More updates soon.

Closes #52

raphlinus added a commit that referenced this issue Feb 24, 2022

Post on piet-gpu clipping

2c8b4e6

Closes #52

raphlinus mentioned this issue Feb 24, 2022

Post on piet-gpu clipping #67

Merged

raphlinus closed this as completed in #67 Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clipping in piet-gpu #52

Clipping in piet-gpu #52

raphlinus commented Apr 22, 2021

eliasnaur commented Apr 23, 2021

raphlinus commented Apr 23, 2021

eliasnaur commented Apr 23, 2021 •

edited

Loading

raphlinus commented May 1, 2021

raphlinus commented Nov 8, 2021

raphlinus commented Feb 7, 2022

Clipping in piet-gpu #52

Clipping in piet-gpu #52

Comments

raphlinus commented Apr 22, 2021

eliasnaur commented Apr 23, 2021

raphlinus commented Apr 23, 2021

eliasnaur commented Apr 23, 2021 • edited Loading

raphlinus commented May 1, 2021

raphlinus commented Nov 8, 2021

raphlinus commented Feb 7, 2022

eliasnaur commented Apr 23, 2021 •

edited

Loading