Let's add images #293

dfrg · 2023-03-09T22:40:57Z

This patch introduces initial support for rendering images.

The general process is as follows:

Capture image brushes as "patches" at encoding time.
During resolve, allocate all images into an atlas, iteratively increasing atlas size on failure until we reach maximum size (currently 8k). Patch draw data stream with atlas locations.
At render, create an atlas texture of the necessary dimensions and use Queue::write_texture to blit all images to their associated locations.
draw_leaf computes an inverse transform, same as radial gradients (7 words of draw info).
coarse writes the image parameters into the ptcl (4 words).
fine reads the image parameters, converts coords from screen to atlas space, clips, loads 4 pixels and performs bilinear filtering.

I haven't done any performance measurement, but the fine stage hurts a bit. This can be simplified considerably for images where the upper 2x2 matrix is identity and the translation is integral.

The alternative is to follow the original plan in #176 which calls for rasterizing pre-transformed images into the atlas. This makes fine cheaper at the cost of significantly more complexity at the setup stage and less efficient atlas allocation.

On the todo list: more robustness, retained caching, extend modes, ...

Atlas offset and image size were originally stored in the ptcl but are not tile dependent. Moving these to info saves 8 bytes per image tile.

DJMcNab · 2023-03-10T15:35:06Z

examples/scenes/src/test_scenes.rs

@@ -97,6 +101,11 @@ fn cardioid_and_friends(sb: &mut SceneBuilder, _: &mut SceneParams) {
 }

 fn animated_text(sb: &mut SceneBuilder, params: &mut SceneParams) {
+    let piet_logo = params
+        .images
+        .from_bytes(PIET_LOGO_IMAGE.as_ptr() as usize, PIET_LOGO_IMAGE)


From a strict provenance perspective, this should use pointer::addr (whereas as usize exposes the address, AIUI). That however is unstable.

This is just my expedient hacky solution to support image loading for examples. Being an example though, it probably does deserve a “don’t do this in real code!” comment.

Since the address is never converted back into a pointer, I don’t think provenance is an issue. Size likely would be on CHERI architectures though.

Yeah, to be clear, there's no soundness issue here.
This is just an observation that the semantics of pointer to integer casts are messy. The way I think this works is that the use of as usize "tells" the compiler that you might do a corresponding as *const _, even though there isn't. .addr doesn't have that "side effect", but is unstable.

This is not at all important, I just recognised this as an opportunity to brush up my understanding of strict aliasing

Fair enough! Pointer to int casts are always worth calling out in code review just to have some discussion logged for future reference.

DJMcNab · 2023-03-10T15:40:36Z

examples/scenes/src/test_scenes.rs

 use vello::peniko::*;
 use vello::*;

+const PIET_LOGO_IMAGE: &[u8] = include_bytes!("../../assets/piet-logo.png");


Not so sure about using the Piet logo here. Ideally, we'd have a Vello logo, or a Linebender logo.

I don't have a better suggestion for what we can use.

Linebender or vello logos would be great! This choice allowed me to sidestep licensing concerns and, as an additional benefit, has transparency which exposed a filtering bug.

What's the licensing situation for the druid logo from the screenshot you posted on xulip chat? I'm guessing Druid is a bit more relevant than Piet.

I'm supportive of a weekend art project for a vello logo :)

I assume we're free to use the druid logo as well but I thought piet might be slightly more closely related to vello given the lineage. For now, the priority is just to have some example that displays an image.

I can provide colorful... commentary on a logo project :) I don't think my artistic talents are up to the task!

armansito

Overall looks good. One thing that got me thinking is whether the ImageCache should be part of the encoding module or if it should be a separate utility (perhaps part of the scene API). It sounds generally useful and how the atlas is packed is an intrinsic part of the encoding.

When/if we move the encoding module into its own crate, should ImageCache be provided by that or should we try to provide a leaner API? I'm not strongly opinionated either way but I am thinking about how to keep the native (C/C++) bridging for clients simple and the library lightweight.

examples/scenes/src/images.rs

armansito · 2023-03-10T17:17:40Z

examples/scenes/src/test_scenes.rs

 use vello::peniko::*;
 use vello::*;

+const PIET_LOGO_IMAGE: &[u8] = include_bytes!("../../assets/piet-logo.png");


What's the licensing situation for the druid logo from the screenshot you posted on xulip chat? I'm guessing Druid is a bit more relevant than Piet.

I'm supportive of a weekend art project for a vello logo :)

armansito · 2023-03-10T17:25:10Z

shader/coarse.wgsl

@@ -377,6 +384,13 @@ fn main(
                            write_grad(CMD_RAD_GRAD, index, info_offset);
                        }
                    }
+                    // DRAWTAG_FILL_IMAGE
+                    case 0x248u: {


It's unfortunate that naga still doesn't seem to support non-literal const-expressions as a case selector (which is supported by the spec).

I had to change those values several times while playing with the encoding and missing one of these actually led to a TDR. So yeah, this is supremely annoying.

On the naga side this seems to tracked by gfx-rs/naga#2106. I'm putting a mention here and I'll ping the naga issue as well as this would be a huge quality of life improvement (I've wanted this in other projects as well).

Thanks for the link. This will hopefully make the cut for the next release.

armansito · 2023-03-10T17:30:40Z

shader/draw_leaf.wgsl

+            // DRAWTAG_FILL_IMAGE
+            case 0x248u: {
+                info[di] = bitcast<u32>(linewidth);
+                let inv_det = 1.0 / (matrx.x * matrx.w - matrx.y * matrx.z);


This is perfectly fine as is but if we were to use the built-in matrix types this could possible use the determinant intrinsic. Maybe worth exploring in the future (I also don't know if there are any potential compatibility issues with using the intrinsics over hand-rolling calculation yourself).

It would be great to do a pass through the code to make use of appropriate matrix types/ops where appropriate. For this PR, I'd like to avoid the extra churn though.

For sure, I didn't mean for that to happen in this CL. I'm happy to do a fixup pass later for matrices seeing as I've brought it up a couple times already and you seem to like the idea :)

src/engine.rs

dfrg · 2023-03-10T20:10:39Z

When/if we move the encoding module into its own crate, should ImageCache be provided by that or should we try to provide a leaner API? I'm not strongly opinionated either way but I am thinking about how to keep the native (C/C++) bridging for clients simple and the library lightweight.

Doing the final packing requires resolving the atlas layout in some way. This currently sits in the encoding module, but I can easily see breaking that one up into both encode and resolve with traits for pluggable caching mechanisms (both glyphs and images) on the resolve side. The current state is not very modular, but I think the boundaries are sharp enough to increase the modularity without too much trouble.

* Add comment about naughty ptr2int cast * Change Option to anyhow::Result in test scene ImageCache * Replace magic constant with pixel stride from selected format

armansito · 2023-03-10T21:00:21Z

When/if we move the encoding module into its own crate, should ImageCache be provided by that or should we try to provide a leaner API? I'm not strongly opinionated either way but I am thinking about how to keep the native (C/C++) bridging for clients simple and the library lightweight.

Doing the final packing requires resolving the atlas layout in some way. This currently sits in the encoding module, but I can easily see breaking that one up into both encode and resolve with traits for pluggable caching mechanisms (both glyphs and images) on the resolve side. The current state is not very modular, but I think the boundaries are sharp enough to increase the modularity without too much trouble.

OK, let's discuss this on Monday. I don't think we need to block this PR on this issue but I think let's also plan to talk about what the FFI bridge should look like. Once we pull encoding into its own separate crate we can measure the code size and see if it's worth modularizing.

armansito

I would say Raph should probably take a look as well and Daniel should approve following his comments but LGTM from my point of view.

I left two more comments around some unused imports. I tested the with_winit/headless examples on M1 Max and with_winit on Android and didn't run into any issues.

As you mentioned, there is a slight performance decrease that I noticed on Pixel 6 (framerate for the scene drops from ~80fps to ~70fps). I didn't see a performance drop on M1 Max.

It might be a good idea to add a test scene with many images just so we get an idea about performance when there are many atlas lookups.

armansito · 2023-03-10T20:51:27Z

shader/draw_leaf.wgsl

+            // DRAWTAG_FILL_IMAGE
+            case 0x248u: {
+                info[di] = bitcast<u32>(linewidth);
+                let inv_det = 1.0 / (matrx.x * matrx.w - matrx.y * matrx.z);


For sure, I didn't mean for that to happen in this CL. I'm happy to do a fixup pass later for matrices seeing as I've brought it up a couple times already and you seem to like the idea :)

examples/scenes/src/test_scenes.rs

armansito · 2023-03-10T20:53:32Z

examples/scenes/src/test_scenes.rs

 use crate::{ExampleScene, SceneConfig, SceneParams, SceneSet};
-use vello::kurbo::{Affine, BezPath, Ellipse, PathEl, Point, Rect};
+use vello::kurbo::{Affine, BezPath, Ellipse, PathEl, Point, Rect, Vec2};


I think the Vec2 import is unused?

Fixed! There are a few more issues with unused symbols. They've been lingering a while so I'll do another pass to fix those soon (and also make the codebase clippy clean)

dfrg · 2023-03-11T02:01:42Z

OK, let's discuss this on Monday. I don't think we need to block this PR on this issue but I think let's also plan to talk about what the FFI bridge should look like. Once we pull encoding into its own separate crate we can measure the code size and see if it's worth modularizing.

Looking forward to it!

raphlinus

Overall this looks good to me. I am not attached to scaling the image in a rasterization pipeline, though my gut feeling is that it would be higher performance in many cases. Ultimately that's a matter for very careful performance analysis, and the goal for this cycle is to get something that works. This approach also avoids the problems of really huge atlas entries when scaling way up.

Thanks, and excited to see this!

raphlinus · 2023-03-14T22:08:13Z

shader/fine.wgsl

+                    if all(atlas_uv < atlas_extents) {
+                        let uv_quad = vec4(max(floor(atlas_uv), image.atlas_offset), min(ceil(atlas_uv), atlas_extents));
+                        let uv_frac = fract(atlas_uv);
+                        let a = premul_alpha(textureLoad(image_atlas, vec2<i32>(uv_quad.xy), 0));


I'm curious why the by-hand bilinear interpolation here rather than using hardware. Going over the WGSL intrinsics, I think textureSampleLevel is the most appropriate version - unlike textureSample it doesn't rely on derivatives, and I think it's likely to be more performant than textureSampleGrad (which based on past piet-gpu experimentation was slow).

That could perhaps be followup work to this patch.

I hadn’t intended to do filtering at all in this PR so it was sort of tacked on as an afterthought. I’ll work on hardware sampling as a followup.

raphlinus · 2023-03-14T22:10:47Z

src/encoding/draw.rs

-    pub index: u32,
-    /// Packed image offset.
-    pub offset: u32,
+    /// Packed atlas coordinates.


Would [u16; 2] be a better choice here? It feels more semantic, but perhaps just using u32 leaves less slop around stuff like endianness.

Yeah, it was simply a desire to keep the shape of the data the same for both CPU and GPU. Also, when we do add descriptor indexing, the xy field (maybe renamed location) can be used as the array index in that mode.

raphlinus · 2023-03-14T22:21:39Z

shader/fine.wgsl

+                    let my_xy = vec2(xy.x + f32(i), xy.y);
+                    let atlas_uv = image.matrx.xy * my_xy.x + image.matrx.zw * my_xy.y + image.xlat + image.atlas_offset;
+                    // This currently clips to the image bounds. TODO: extend modes
+                    if all(atlas_uv < atlas_extents) {


My gut feeling is that performance might be improved considerably if this is also predicated on area[i] != 0.0, though that will almost certainly depend on the hardware. These loads are already conditional, and I think it can reduce texture sample bandwidth for the tile wastage case. Perhaps it's worth doing careful experiments and landing this with the simplest case.

We had discussed this and I agree that since we’re already branching, might as well add the additional check. Will do before merging.

renderdender · 2023-03-16T10:43:53Z

Overall looks good. One thing that got me thinking is whether the ImageCache should be part of the encoding module or if it should be a separate utility (perhaps part of the scene API). It sounds generally useful and how the atlas is packed is an intrinsic part of the encoding.

When/if we move the encoding module into its own crate, should ImageCache be provided by that or should we try to provide a leaner API? I'm not strongly opinionated either way but I am thinking about how to keep the native (C/C++) bridging for clients simple and the library lightweight.

I've been experimenting with SVG parser and realized indeed it would be better to place ImageCache to Scene/Encoding, otherwise there will be circular dependencies if you use 'ImageCache' from integrations/vello_svg/.
Just my humble opinion.

dfrg · 2023-03-16T15:33:31Z

Thanks for pointing this out. For higher level integrations like SVG and Lottie that make use of the scene API, the (resolve time) image cache is essentially an internal implementation detail and users shouldn’t have to worry about it.

dfrg added 3 commits March 9, 2023 17:18

Let's add images

165b3a0

move atlas rect to info

a858578

Atlas offset and image size were originally stored in the ptcl but are not tile dependent. Moving these to info saves 8 bytes per image tile.

premultiply alpha before filtering

d12b711

DJMcNab reviewed Mar 10, 2023

View reviewed changes

armansito reviewed Mar 10, 2023

View reviewed changes

Address initial review feedback

37f05b8

* Add comment about naughty ptr2int cast * Change Option to anyhow::Result in test scene ImageCache * Replace magic constant with pixel stride from selected format

armansito approved these changes Mar 10, 2023

View reviewed changes

Remove some unused imports

923c949

raphlinus approved these changes Mar 14, 2023

View reviewed changes

predicate image loads on non-zero mask

a6307a2

dfrg merged commit f3d45fc into main Mar 15, 2023

dfrg deleted the images branch March 15, 2023 12:40

bzm3r mentioned this pull request May 28, 2023

Image resources #176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's add images #293

Let's add images #293

dfrg commented Mar 9, 2023

DJMcNab Mar 10, 2023

dfrg Mar 10, 2023

DJMcNab Mar 10, 2023

dfrg Mar 10, 2023

DJMcNab Mar 10, 2023

dfrg Mar 10, 2023

armansito Mar 10, 2023 •

edited

Loading

dfrg Mar 10, 2023

armansito left a comment

armansito Mar 10, 2023 •

edited

Loading

armansito Mar 10, 2023

dfrg Mar 10, 2023

armansito Mar 10, 2023

dfrg Mar 11, 2023

armansito Mar 10, 2023

dfrg Mar 10, 2023

armansito Mar 10, 2023

dfrg Mar 11, 2023

dfrg commented Mar 10, 2023

armansito commented Mar 10, 2023

armansito left a comment •

edited

Loading

armansito Mar 10, 2023

armansito Mar 10, 2023

dfrg Mar 11, 2023

dfrg commented Mar 11, 2023

raphlinus left a comment

raphlinus Mar 14, 2023

dfrg Mar 15, 2023

raphlinus Mar 14, 2023

dfrg Mar 15, 2023

raphlinus Mar 14, 2023

dfrg Mar 15, 2023

renderdender commented Mar 16, 2023

dfrg commented Mar 16, 2023

Let's add images #293

Let's add images #293

Conversation

dfrg commented Mar 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito left a comment

Choose a reason for hiding this comment

armansito Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfrg commented Mar 10, 2023

armansito commented Mar 10, 2023

armansito left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfrg commented Mar 11, 2023

raphlinus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renderdender commented Mar 16, 2023

dfrg commented Mar 16, 2023

armansito Mar 10, 2023 •

edited

Loading

armansito Mar 10, 2023 •

edited

Loading

armansito left a comment •

edited

Loading