Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

Closed
teh-cmc opened this issue May 20, 2024 · 11 comments · Fixed by #6942
Closed

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

teh-cmc opened this issue May 20, 2024 · 11 comments · Fixed by #6942
Assignees
Labels
💬 discussion 🍏 primitives Relating to Rerun primitives

Comments

@teh-cmc
Copy link
Member

teh-cmc commented May 20, 2024

  • The blob is untyped -- it's always a u8 blob, not a generic TensorData.
  • The stride should be a component, overridable in the UI and/or blueprint.
  • The pixel format should be a component, overridable in the UI and/or blueprint.

Examples of pixel formats:

  • RGBA8
  • RG32F
  • NV12
  • JPEG

Also:

  • "Swizzling" width and height can be handled using a 2D transform.
  • If someone creates an Image straight out of a numpy tensor, we still log the shape as a component, which the viewer will use to guess the stride and pixel format, unless they have been logged explicitly by the user.
  • Some pixel formats might require secondary caching due to heavy intermediate representation changes (e.g. JPEG).

Update

Following live discussion, we introduce ImageEncoded, as such:

archetype Image {
  buffer: ImageBuffer,
  resolution: Resolution2D,
  pixel_format: PixelFormat,
  stride: Option<ImageStride>,
}

component ImageStride {
  bytes_per_row: u32,
  bytes_per_plane: Option<u32>,
}

/// e.g. JPEG, PNG, …
archetype ImageEncoded {
  blob: Blob,
  media_type: Option<MediaType>,
}
@teh-cmc teh-cmc added 🍏 primitives Relating to Rerun primitives 💬 discussion labels May 20, 2024
@teh-cmc
Copy link
Member Author

teh-cmc commented May 20, 2024

TensorData stays typed -- but it is completely unrelated to rr.Image now

@emilk
Copy link
Member

emilk commented May 20, 2024

One user story to consider when designing this is logging a batch of images as a single tensor (very common in the deep learning world).

@emilk
Copy link
Member

emilk commented May 27, 2024

Alternative version that still unifies tensors and images:

archetype Tensor {
    buffer: BufferU8,
    shape: TensorShape,
    element_type: TensorElementType,
}

archetype Image {
    buffer: BufferU8,
    shape: TensorShape,
    image_type: PixelFormat,
}

component BufferU8 {
    data: [u8],
}

enum TensorElementType {
    U8,
    U16,
    U32,
    U64,

    I8,
    I16,
    I32,
    I64,

    F16,
    F32,
    F64,
}

enum PixelFormat {
    /* Image formats */
    RGBA8_SRGB_22,
    RG32F,
    NV12,
    // ...

    /* Depth formats */
    F16,
    F32,
    F32_LINEAR_XXX,
    // ...

    /* Segmentation formats */
    U8,
    U16,
    // ...
}

@emilk
Copy link
Member

emilk commented May 27, 2024

We want to get rid of arrow unions.

Ultimately we want datatype conversions, so we can have a single TensorBuffer component be backed by multiple datatypes (BufferU8, BufferU16, …).
However datatype conversions are not here yet.

So what do we do in the meantime? I see two major suggestions

A) We can split tensor into 11 different archetypes and components (TensorU8, TensorU16, etc) as suggested here: #6388

B) We can also use a single U8 buffer for all tensors and then an enum for its type as suggested in #6386 (comment)

A) has the advantage that it uses the correct arrow datatype. However, it has 11x the archetypes and 11x components.
B) has the advantage of type-erasure, i.e. "Get me a Tensor" instead of "Get me a Tensor, and it must be F32"


A second discussion: should tensors and images be completely different things, or unified?

@jleibs
Copy link
Member

jleibs commented May 27, 2024

One of the motivators for the tensor refactor work is to make the raw arrow APIs more approachable, esp for ML workflows that are tensor-heavy.

I don't think type erasure is a good thing in that case, as it adds more low-level complexity.

Another thing I remembered is that Arrow has an official variable-shape-tensor spec. It's very close to the new multi-archetype proposal: https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor if we go down the multi-archetype path I think we should consider aligning with the official spec since it raises the probability that another library that outputs or consumes tensors from arrow will be able to return chunks already in this exact structure.

@jleibs
Copy link
Member

jleibs commented May 27, 2024

Ultimately we want datatype conversions, so we can have a single TensorBuffer component be backed by multiple datatypes (BufferU8, BufferU16, …).
However datatype conversions are not here yet.

This also aligns with the multi-archetype proposal. We can define the datatypes now, and because we don't support data type conversions it adds more work on the visualizes to support one-of-many components. Later, we remap to a single component but the arrow-encoded datatypes don't have to change. This will be far easier to manage from both a migration and a backwards compatibility perspective.

@emilk
Copy link
Member

emilk commented May 27, 2024

https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor

That reminds me: we should have separate shape: [u64] and dim_names: [str] components:

@Wumpf Wumpf removed their assignment Jul 9, 2024
@emilk emilk self-assigned this Jul 10, 2024
@emilk emilk changed the title Images should be encoded as a blob + pixel format + stride Encode images as a byte blob + pixel format + resolution; not as a tensor Jul 10, 2024
@emilk
Copy link
Member

emilk commented Jul 14, 2024

A concrete design

Problem

We want to support many precise and common pixel formats, like RGB565, NV12, etc.

We also want to easily be able to convert tensors (e.g. numpy arrays) into an Image without having a combinatorial explosion of weird pixel formats that will dilute the PixelFormat enum (BGR_i16 etc).

Solution

The Image archetype must have EITHER a PixelFormat enum with the most common pixel formats, OR the two ColorModel + ElementType components. PixelFormat always wins if it exists.

PixelFormat path

enum PixelFormat {
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,}

Fully specifies the ColorModel and Planarity of an image.

See https://facebookresearch.github.io/ocean/docs/images/pixel_formats_and_plane_layout/ for prior art on naming these.

ColorModel and Datatype

If there is no PixelFormat, the viewer will look for two other components:

  • ColorModel: RGB, RGBA, BGR, BGRA, L, LA, A, …
  • ElementType: U4, U8, U10, U16, U32, F16, F32, F64, …

This spans a huge cartesian product space of possibilities, and makes it easy to convert a tensor or numpy array into an Image.

Having as two enum components means:

  • Two small enums instead of one huge one
  • Easier to add support for more ColorModels
  • The logging API becomes simpler, since datatype can often be inferred automatically (e.g. from the numpy datatype), so only the ColorType needs specifying (e.g. with rr.Image(rgb=numpy_image) or rr.Image(np_img, "RGB"))
  • It's easier to switch from BGR to RGB in the UI if there is combobox with only a few choices

In the future we can also add a component here to specify Planarity: Are the RGB components interleaved or in different planes?

Color spaces and gamma curves

More research needed, but at the start we only need to support two things:

enum ColorSpace { sRGB_Linear, sRGB_Encoded }

The linear vs encoded only applies to ColorModel+Datatype.

Summary

component enum PixelFormat {
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,}

component enum ColorModel { RGB, RGBA, BGR, BGRA, L, LA, A }

component enum ElementType { U4, U8, U10, U16, U32, F16, F32, F64,}

archetype Image {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,

  // If not specified, then `ColorModel` AND `ElementType` MUST be specified
  pixel_format: Option<PixelFormat>,

  color_model: Option<ColorModel>,
  element_type: Option<ElementType>,
  
  // Optional (and not in first version):
  color_space: Option<ColorSpace>,
  // Largest value, e.g. `1.0` or `255.0`, useful for `ElementType::F32`
  max: Option<f64>,
  stride: Option<ImageStride>,
}

archetype DepthImage {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,
  element_type: ElementType,
  
  // Optional (and not in first version):
  stride: Option<ImageStride>,
}

archetype SegmentationImage {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,
  element_type: ElementType,
  
  // Optional (and not in first version):
  stride: Option<ImageStride>,
}

archetype Tensor {// If set, interpret this tensor as an image.
  color_model: Option<ColorModel>,
}

Discussion

https://rerunio.slack.com/archives/C041NHU952S/p1721031794678379

@emilk emilk mentioned this issue Jul 16, 2024
10 tasks
emilk added a commit that referenced this issue Jul 17, 2024
### What
* Part of #6386

This changes `DepthImage` from a tensor to a byte-blob + resolution +
data-type.
It adds new code for hovering images and uploading images to GPU, both
general enough to also work for color images and segmentation images.
This means those PRs will be a lot smaller.

Since we still need to support color and segmentation images encoded as
tensors, the code is quite complex, with duplicated paths here and
there. It will all get cleaned up in later PRs.

I'm reusing a lot of stuff from tensors, like `TensorMeaning`,
`TensorStats` `TensorElement`, but these will be renamed and improved
before #6386 is done.

### TODO
* [x] Update `depth_image_ext.cpp`
* [x] Update `depth_image_ext.py`
* [x] Fix hovering depth images in 3D view

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide
* [x] Test a few local examples

- [PR Build Summary](https://build.rerun.io/pr/6915)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.
emilk added a commit that referenced this issue Jul 18, 2024
### What
* Part of #6386
* Part of #6844
* Very similar to #6915

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide

- [PR Build Summary](https://build.rerun.io/pr/6928)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.
@emilk emilk mentioned this issue Jul 18, 2024
10 tasks
@jleibs jleibs assigned Wumpf and unassigned emilk Jul 22, 2024
@Wumpf Wumpf assigned jleibs and unassigned Wumpf Jul 30, 2024
@jleibs
Copy link
Member

jleibs commented Jul 30, 2024

The proposed design above introduces quite a bit of complexity:

archetype Image {
  // Required:
  buffer: ImageBuffer,
  resolution: Resolution2D,

  // If not specified, then `ColorModel` AND `ElementType` MUST be specified
  pixel_format: Option<PixelFormat>,

  color_model: Option<ColorModel>,
  element_type: Option<ElementType>,
}

Whereas we previously had a single self-describing TensorData component, we now need to query 5 different components and join them to interpret the contents of the ImageBuffer. There are many more edge cases now allowing for misuse, partial updates, and invalid representations.

There are a few practical benefits to splitting ImageBuffer from the metadata describing the image shape. Examples:

  • Fix RGB <-> BGR mixup in the blueprint
  • Inspect Bayer-encoded data as a mono image

However, we can still get the vast majority of this utility by combining the assorted metadata into a single, required, ImageFormat component:

archetype Image {
  // Required:
  buffer: ImageBuffer,
  format: ImageFormat
}

// More noise from our datamodel.
component ImageFormat {
  datatype ImageFormat;
}

datatype ImageFormat {
  width: uint,
  height: uint,
  row_stride: Option<uint>

  pixel_format: PixelFormat,

  color_model: Option<ColorModel>, // used if pixel_format = ARRAY
  element_type: Option<ElementType> // used if pixel_format = ARRAY
}

component enum PixelFormat {
        ARRAY,
	RGB_565,
	RGBA_8888,
	NV12,
	YUY2,
	…
}

This also simplifies the ImageFormat a bit by explicitly splitting out width and height, as well as making pixel_format always required, with a special value indicating the usage of the last 2 optional fields.

The only practical downside of this change is you can't (yet) just override color_model. You have to override the entirety of the image format.

  • On the viewer, this probably means some more UI code. Writing the redundant metadata is cheap enough that we shouldn't be worried about it.
  • On the blueprint API side, this means you have to specify the whole ImageFormat() component if adding an override, so you can't generically swap color model without knowing the rest of the format information about the image. Given how niche this is, I think it's a fair compromise.

@jleibs
Copy link
Member

jleibs commented Jul 31, 2024

Since the above apparently wasn't clear enough in terms of the problems it was trying to address:

If Resolution, Stride, PixelFormat, ColorModel, and Datatype are all separate components, this means they can be updated totally asynchronously and that they all need to be joined together by either a range or latest-at query in order to have all the information to interpret an image buffer.

In turn, every image query must bottom out in a 5-way component join, which, in a pathological case, spans 5 chunks and has potentially different join results depending on the timeline, some of which yield corrupt data interpretation.

Here’s an example: let’s suppose someone first logs an image using nv12 pixel format and then logs an image using rgb color model. If they don’t realize they needed to explicitly clear pixel format (since it’s an optional component), it now joins into their query results implicitly via latest-at, yielding corrupted data.

If everyone only uses our archetype APIs and those APIs are carefully crafted and tested we can generally work around this issue. But this is back to saying our low-level arrow APIs aren’t approachable to users, which is the very problem we’re trying to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 discussion 🍏 primitives Relating to Rerun primitives
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants