Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

teh-cmc · 2024-05-20T17:31:14Z

The blob is untyped -- it's always a u8 blob, not a generic TensorData.
The stride should be a component, overridable in the UI and/or blueprint.
The pixel format should be a component, overridable in the UI and/or blueprint.

Examples of pixel formats:

RGBA8
RG32F
NV12
~~JPEG~~

Also:

"Swizzling" width and height can be handled using a 2D transform.
If someone creates an Image straight out of a numpy tensor, we still log the shape as a component, which the viewer will use to guess the stride and pixel format, unless they have been logged explicitly by the user.
Some pixel formats might require secondary caching due to heavy intermediate representation changes (e.g. JPEG).

Update

Following live discussion, we introduce ImageEncoded, as such:

archetype Image {
  buffer: ImageBuffer,
  resolution: Resolution2D,
  pixel_format: PixelFormat,
  stride: Option<ImageStride>,
}

component ImageStride {
  bytes_per_row: u32,
  bytes_per_plane: Option<u32>,
}

/// e.g. JPEG, PNG, …
archetype ImageEncoded {
  blob: Blob,
  media_type: Option<MediaType>,
}

The text was updated successfully, but these errors were encountered:

teh-cmc · 2024-05-20T17:50:17Z

TensorData stays typed -- but it is completely unrelated to rr.Image now

emilk · 2024-05-20T18:33:40Z

One user story to consider when designing this is logging a batch of images as a single tensor (very common in the deep learning world).

Wumpf · 2024-05-20T18:36:57Z

Related to:

emilk · 2024-05-27T08:07:45Z

Alternative version that still unifies tensors and images:

archetype Tensor {
    buffer: BufferU8,
    shape: TensorShape,
    element_type: TensorElementType,
}

archetype Image {
    buffer: BufferU8,
    shape: TensorShape,
    image_type: PixelFormat,
}

component BufferU8 {
    data: [u8],
}

enum TensorElementType {
    U8,
    U16,
    U32,
    U64,

    I8,
    I16,
    I32,
    I64,

    F16,
    F32,
    F64,
}

enum PixelFormat {
    /* Image formats */
    RGBA8_SRGB_22,
    RG32F,
    NV12,
    // ...

    /* Depth formats */
    F16,
    F32,
    F32_LINEAR_XXX,
    // ...

    /* Segmentation formats */
    U8,
    U16,
    // ...
}

emilk · 2024-05-27T08:20:34Z

We want to get rid of arrow unions.

Ultimately we want datatype conversions, so we can have a single TensorBuffer component be backed by multiple datatypes (BufferU8, BufferU16, …).
However datatype conversions are not here yet.

So what do we do in the meantime? I see two major suggestions

A) We can split tensor into 11 different archetypes and components (TensorU8, TensorU16, etc) as suggested here: #6388

B) We can also use a single U8 buffer for all tensors and then an enum for its type as suggested in #6386 (comment)

A) has the advantage that it uses the correct arrow datatype. However, it has 11x the archetypes and 11x components.
B) has the advantage of type-erasure, i.e. "Get me a Tensor" instead of "Get me a Tensor, and it must be F32"

A second discussion: should tensors and images be completely different things, or unified?

jleibs · 2024-05-27T09:03:06Z

One of the motivators for the tensor refactor work is to make the raw arrow APIs more approachable, esp for ML workflows that are tensor-heavy.

I don't think type erasure is a good thing in that case, as it adds more low-level complexity.

Another thing I remembered is that Arrow has an official variable-shape-tensor spec. It's very close to the new multi-archetype proposal: https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor if we go down the multi-archetype path I think we should consider aligning with the official spec since it raises the probability that another library that outputs or consumes tensors from arrow will be able to return chunks already in this exact structure.

jleibs · 2024-05-27T09:08:27Z

Ultimately we want datatype conversions, so we can have a single TensorBuffer component be backed by multiple datatypes (BufferU8, BufferU16, …).
However datatype conversions are not here yet.

This also aligns with the multi-archetype proposal. We can define the datatypes now, and because we don't support data type conversions it adds more work on the visualizes to support one-of-many components. Later, we remap to a single component but the arrow-encoded datatypes don't have to change. This will be far easier to manage from both a migration and a backwards compatibility perspective.

emilk · 2024-05-27T12:56:44Z

https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor

That reminds me: we should have separate shape: [u64] and dim_names: [str] components:

Split TensorDimension into shape and names #6830

emilk · 2024-07-14T20:54:22Z

A concrete design

Problem

We want to support many precise and common pixel formats, like RGB565, NV12, etc.

We also want to easily be able to convert tensors (e.g. numpy arrays) into an Image without having a combinatorial explosion of weird pixel formats that will dilute the PixelFormat enum (BGR_i16 etc).

Solution

The Image archetype must have EITHER a PixelFormat enum with the most common pixel formats, OR the two ColorModel + ElementType components. PixelFormat always wins if it exists.

`PixelFormat` path

enum PixelFormat {
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,
	…
}

Fully specifies the ColorModel and Planarity of an image.

See https://facebookresearch.github.io/ocean/docs/images/pixel_formats_and_plane_layout/ for prior art on naming these.

`ColorModel` and `Datatype`

If there is no PixelFormat, the viewer will look for two other components:

ColorModel: RGB, RGBA, BGR, BGRA, L, LA, A, …
ElementType: U4, U8, U10, U16, U32, F16, F32, F64, …

This spans a huge cartesian product space of possibilities, and makes it easy to convert a tensor or numpy array into an Image.

Having as two enum components means:

Two small enums instead of one huge one
Easier to add support for more ColorModels
The logging API becomes simpler, since datatype can often be inferred automatically (e.g. from the numpy datatype), so only the ColorType needs specifying (e.g. with rr.Image(rgb=numpy_image) or rr.Image(np_img, "RGB"))
It's easier to switch from BGR to RGB in the UI if there is combobox with only a few choices

In the future we can also add a component here to specify Planarity: Are the RGB components interleaved or in different planes?

Color spaces and gamma curves

More research needed, but at the start we only need to support two things:

enum ColorSpace { sRGB_Linear, sRGB_Encoded }

The linear vs encoded only applies to ColorModel+Datatype.

Summary

component enum PixelFormat {
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,
	…
}

component enum ColorModel { RGB, RGBA, BGR, BGRA, L, LA, A }

component enum ElementType { U4, U8, U10, U16, U32, F16, F32, F64, … }

archetype Image {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,

  // If not specified, then `ColorModel` AND `ElementType` MUST be specified
  pixel_format: Option<PixelFormat>,

  color_model: Option<ColorModel>,
  element_type: Option<ElementType>,
  
  // Optional (and not in first version):
  color_space: Option<ColorSpace>,
  // Largest value, e.g. `1.0` or `255.0`, useful for `ElementType::F32`
  max: Option<f64>,
  stride: Option<ImageStride>,
}

archetype DepthImage {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,
  element_type: ElementType,
  
  // Optional (and not in first version):
  stride: Option<ImageStride>,
}

archetype SegmentationImage {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,
  element_type: ElementType,
  
  // Optional (and not in first version):
  stride: Option<ImageStride>,
}

archetype Tensor {
    …

  // If set, interpret this tensor as an image.
  color_model: Option<ColorModel>,
}

Discussion

https://rerunio.slack.com/archives/C041NHU952S/p1721031794678379

### What * Part of #6386 This changes `DepthImage` from a tensor to a byte-blob + resolution + data-type. It adds new code for hovering images and uploading images to GPU, both general enough to also work for color images and segmentation images. This means those PRs will be a lot smaller. Since we still need to support color and segmentation images encoded as tensors, the code is quite complex, with duplicated paths here and there. It will all get cleaned up in later PRs. I'm reusing a lot of stuff from tensors, like `TensorMeaning`, `TensorStats` `TensorElement`, but these will be renamed and improved before #6386 is done. ### TODO * [x] Update `depth_image_ext.cpp` * [x] Update `depth_image_ext.py` * [x] Fix hovering depth images in 3D view ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide * [x] Test a few local examples - [PR Build Summary](https://build.rerun.io/pr/6915) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.

### What * Part of #6386 * Part of #6844 * Very similar to #6915 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/6928) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.

jleibs · 2024-07-30T13:37:43Z

The proposed design above introduces quite a bit of complexity:

archetype Image {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,

  // If not specified, then `ColorModel` AND `ElementType` MUST be specified
  pixel_format: Option<PixelFormat>,

  color_model: Option<ColorModel>,
  element_type: Option<ElementType>,
}

Whereas we previously had a single self-describing TensorData component, we now need to query 5 different components and join them to interpret the contents of the ImageBuffer. There are many more edge cases now allowing for misuse, partial updates, and invalid representations.

There are a few practical benefits to splitting ImageBuffer from the metadata describing the image shape. Examples:

Fix RGB <-> BGR mixup in the blueprint
Inspect Bayer-encoded data as a mono image

However, we can still get the vast majority of this utility by combining the assorted metadata into a single, required, ImageFormat component:

archetype Image {
  // Required:
  buffer: ImageBuffer,
  format: ImageFormat
}

// More noise from our datamodel.
component ImageFormat {
  datatype ImageFormat;
}

datatype ImageFormat {
  width: uint,
  height: uint,
  row_stride: Option<uint>

  pixel_format: PixelFormat,

  color_model: Option<ColorModel>, // used if pixel_format = ARRAY
  element_type: Option<ElementType> // used if pixel_format = ARRAY
}

component enum PixelFormat {
        ARRAY,
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,
	…
}

This also simplifies the ImageFormat a bit by explicitly splitting out width and height, as well as making pixel_format always required, with a special value indicating the usage of the last 2 optional fields.

The only practical downside of this change is you can't (yet) just override color_model. You have to override the entirety of the image format.

On the viewer, this probably means some more UI code. Writing the redundant metadata is cheap enough that we shouldn't be worried about it.
On the blueprint API side, this means you have to specify the whole ImageFormat() component if adding an override, so you can't generically swap color model without knowing the rest of the format information about the image. Given how niche this is, I think it's a fair compromise.

jleibs · 2024-07-31T11:28:28Z

Since the above apparently wasn't clear enough in terms of the problems it was trying to address:

If Resolution, Stride, PixelFormat, ColorModel, and Datatype are all separate components, this means they can be updated totally asynchronously and that they all need to be joined together by either a range or latest-at query in order to have all the information to interpret an image buffer.

In turn, every image query must bottom out in a 5-way component join, which, in a pathological case, spans 5 chunks and has potentially different join results depending on the timeline, some of which yield corrupt data interpretation.

Here’s an example: let’s suppose someone first logs an image using nv12 pixel format and then logs an image using rgb color model. If they don’t realize they needed to explicitly clear pixel format (since it’s an optional component), it now joins into their query results implicitly via latest-at, yielding corrupted data.

If everyone only uses our archetype APIs and those APIs are carefully crafted and tested we can generally work around this issue. But this is back to saying our low-level arrow APIs aren’t approachable to users, which is the very problem we’re trying to solve.

teh-cmc added 🍏 primitives Relating to Rerun primitives 💬 discussion labels May 20, 2024

teh-cmc mentioned this issue May 24, 2024

Stop using arrow unions #6388

Open

jleibs assigned Wumpf Jul 9, 2024

Wumpf removed their assignment Jul 9, 2024

emilk self-assigned this Jul 10, 2024

emilk mentioned this issue Jul 10, 2024

Refactor Tensor and Image archetypes #6844

Open

1 task

emilk changed the title ~~Images should be encoded as a blob + pixel format + stride~~ Encode images as a byte blob + pixel format + resolution; not as a tensor Jul 10, 2024

emilk mentioned this issue Jul 10, 2024

Separate concepts of image file format / compression vs encoding #3804

Closed

emilk mentioned this issue Jul 16, 2024

New DepthImage archetype #6915

Merged

10 tasks

emilk mentioned this issue Jul 17, 2024

Port SegmentationImage to the new image archetype style #6928

Merged

6 tasks

emilk mentioned this issue Jul 18, 2024

Rewrite Image archetype #6942

Merged

10 tasks

jleibs assigned Wumpf and unassigned emilk Jul 22, 2024

Wumpf assigned jleibs and unassigned Wumpf Jul 30, 2024

jleibs closed this as completed in #6942 Aug 1, 2024

jleibs closed this as completed in 102a226 Aug 1, 2024

This was referenced Aug 5, 2024

Visualize images in selection view #6891

Closed

Combine many Image components into one struct ImageFormat #7051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

teh-cmc commented May 20, 2024 •

edited by emilk

Loading

teh-cmc commented May 20, 2024

emilk commented May 20, 2024

Wumpf commented May 20, 2024

emilk commented May 27, 2024 •

edited

Loading

emilk commented May 27, 2024 •

edited

Loading

jleibs commented May 27, 2024

jleibs commented May 27, 2024

emilk commented May 27, 2024 •

edited

Loading

emilk commented Jul 14, 2024 •

edited

Loading

jleibs commented Jul 30, 2024 •

edited

Loading

jleibs commented Jul 31, 2024

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

Comments

teh-cmc commented May 20, 2024 • edited by emilk Loading

Update

teh-cmc commented May 20, 2024

emilk commented May 20, 2024

Wumpf commented May 20, 2024

emilk commented May 27, 2024 • edited Loading

emilk commented May 27, 2024 • edited Loading

jleibs commented May 27, 2024

jleibs commented May 27, 2024

emilk commented May 27, 2024 • edited Loading

emilk commented Jul 14, 2024 • edited Loading

A concrete design

Problem

Solution

PixelFormat path

ColorModel and Datatype

Color spaces and gamma curves

Summary

Discussion

jleibs commented Jul 30, 2024 • edited Loading

jleibs commented Jul 31, 2024

teh-cmc commented May 20, 2024 •

edited by emilk

Loading

emilk commented May 27, 2024 •

edited

Loading

emilk commented May 27, 2024 •

edited

Loading

emilk commented May 27, 2024 •

edited

Loading

emilk commented Jul 14, 2024 •

edited

Loading

`PixelFormat` path

`ColorModel` and `Datatype`

jleibs commented Jul 30, 2024 •

edited

Loading