-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode images as a byte blob + pixel format + resolution; not as a tensor #6386
Comments
|
One user story to consider when designing this is logging a batch of images as a single tensor (very common in the deep learning world). |
Alternative version that still unifies tensors and images: archetype Tensor {
buffer: BufferU8,
shape: TensorShape,
element_type: TensorElementType,
}
archetype Image {
buffer: BufferU8,
shape: TensorShape,
image_type: PixelFormat,
}
component BufferU8 {
data: [u8],
}
enum TensorElementType {
U8,
U16,
U32,
U64,
I8,
I16,
I32,
I64,
F16,
F32,
F64,
}
enum PixelFormat {
/* Image formats */
RGBA8_SRGB_22,
RG32F,
NV12,
// ...
/* Depth formats */
F16,
F32,
F32_LINEAR_XXX,
// ...
/* Segmentation formats */
U8,
U16,
// ...
} |
We want to get rid of arrow unions. Ultimately we want datatype conversions, so we can have a single So what do we do in the meantime? I see two major suggestions A) We can split tensor into 11 different archetypes and components ( B) We can also use a single U8 buffer for all tensors and then an enum for its type as suggested in #6386 (comment) A) has the advantage that it uses the correct arrow datatype. However, it has 11x the archetypes and 11x components. A second discussion: should tensors and images be completely different things, or unified? |
One of the motivators for the tensor refactor work is to make the raw arrow APIs more approachable, esp for ML workflows that are tensor-heavy. I don't think type erasure is a good thing in that case, as it adds more low-level complexity. Another thing I remembered is that Arrow has an official variable-shape-tensor spec. It's very close to the new multi-archetype proposal: https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor if we go down the multi-archetype path I think we should consider aligning with the official spec since it raises the probability that another library that outputs or consumes tensors from arrow will be able to return chunks already in this exact structure. |
This also aligns with the multi-archetype proposal. We can define the datatypes now, and because we don't support data type conversions it adds more work on the visualizes to support one-of-many components. Later, we remap to a single component but the arrow-encoded datatypes don't have to change. This will be far easier to manage from both a migration and a backwards compatibility perspective. |
That reminds me: we should have separate |
A concrete designProblemWe want to support many precise and common pixel formats, like RGB565, NV12, etc. We also want to easily be able to convert tensors (e.g. numpy arrays) into an SolutionThe
|
### What * Part of #6386 This changes `DepthImage` from a tensor to a byte-blob + resolution + data-type. It adds new code for hovering images and uploading images to GPU, both general enough to also work for color images and segmentation images. This means those PRs will be a lot smaller. Since we still need to support color and segmentation images encoded as tensors, the code is quite complex, with duplicated paths here and there. It will all get cleaned up in later PRs. I'm reusing a lot of stuff from tensors, like `TensorMeaning`, `TensorStats` `TensorElement`, but these will be renamed and improved before #6386 is done. ### TODO * [x] Update `depth_image_ext.cpp` * [x] Update `depth_image_ext.py` * [x] Fix hovering depth images in 3D view ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6915?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide * [x] Test a few local examples - [PR Build Summary](https://build.rerun.io/pr/6915) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.
### What * Part of #6386 * Part of #6844 * Very similar to #6915 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/6928?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/6928) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.
The proposed design above introduces quite a bit of complexity:
Whereas we previously had a single self-describing TensorData component, we now need to query 5 different components and join them to interpret the contents of the ImageBuffer. There are many more edge cases now allowing for misuse, partial updates, and invalid representations. There are a few practical benefits to splitting ImageBuffer from the metadata describing the image shape. Examples:
However, we can still get the vast majority of this utility by combining the assorted metadata into a single, required, ImageFormat component:
This also simplifies the ImageFormat a bit by explicitly splitting out width and height, as well as making pixel_format always required, with a special value indicating the usage of the last 2 optional fields. The only practical downside of this change is you can't (yet) just override color_model. You have to override the entirety of the image format.
|
Since the above apparently wasn't clear enough in terms of the problems it was trying to address: If Resolution, Stride, PixelFormat, ColorModel, and Datatype are all separate components, this means they can be updated totally asynchronously and that they all need to be joined together by either a range or latest-at query in order to have all the information to interpret an image buffer. In turn, every image query must bottom out in a 5-way component join, which, in a pathological case, spans 5 chunks and has potentially different join results depending on the timeline, some of which yield corrupt data interpretation. Here’s an example: let’s suppose someone first logs an image using nv12 pixel format and then logs an image using rgb color model. If they don’t realize they needed to explicitly clear pixel format (since it’s an optional component), it now joins into their query results implicitly via latest-at, yielding corrupted data. If everyone only uses our archetype APIs and those APIs are carefully crafted and tested we can generally work around this issue. But this is back to saying our low-level arrow APIs aren’t approachable to users, which is the very problem we’re trying to solve. |
u8
blob, not a genericTensorData
.Examples of pixel formats:
RGBA8
RG32F
NV12
JPEG
Also:
Image
straight out of a numpy tensor, we still log the shape as a component, which the viewer will use to guess the stride and pixel format, unless they have been logged explicitly by the user.JPEG
).Update
Following live discussion, we introduce
ImageEncoded
, as such:The text was updated successfully, but these errors were encountered: