Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix python's send_columns failing to convert correctly from ComponentBatch to ComponentColumn in some cases #7155

Merged
merged 7 commits into from
Aug 13, 2024

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Aug 12, 2024

What

Makes it a lot easier to log batches of images in python:

from __future__ import annotations

import numpy as np
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 3

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(
    dtype=np.uint8
)

# Log the ImageFormat and indicator once, as static.
format_static = rr.components.ImageFormat(
    width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8"
)
rr.log("images", [format_static, rr.Image.indicator()], static=True)

# Reshape the images so `ImageBufferBatch` can tell that this is several blobs.
rr.send_columns(
    "images",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))],
)

cc: @Famok

Related things that went into this PR:

  • add a note to tensors not supporting batching right now
  • support arrays of bytes for blob arrays

Manually tested the snippets for send_columns to make sure nothing else broke.
Considered adding the above as a snippet, but I'm still not sure if we want to advertise this prominently. There's some situations where this is useful, but generally we don't have a great usecase for it. Not sure about this 🤔

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@Wumpf Wumpf added 🐍 Python API Python logging API exclude from changelog PRs with this won't show up in CHANGELOG.md labels Aug 12, 2024
@Wumpf Wumpf mentioned this pull request Aug 12, 2024
6 tasks
@Wumpf
Copy link
Member Author

Wumpf commented Aug 12, 2024

@rerun-bot full-check

Copy link

@teh-cmc teh-cmc self-requested a review August 13, 2024 07:15
@Famok
Copy link

Famok commented Aug 13, 2024

You guys are amazing, thanks for trying to make this work!

I'm on version:
rerun_py 0.18.0-alpha.1+dev [rustc 1.76.0 (07dca489a 2024-02-04), LLVM 17.0.6] x86_64-pc-windows-msvc main 1dfb41c, built 2024-08-12T13:00:38Z

I'm getting further this time, but stumbled onto two errors:
First is in _baseclases.py, where a "|" type is used, which is incompatible with python 3.9 (have to use that due to another lib).
--> Fixed that using: ComponentColumnLike = Union[ComponentBatchLike, ComponentColumn]

But the other error I'm unable to fix:

RerunWarning: send_columns: RuntimeError(Detected malformed Chunk: The outer array in chunked component batch must be a sparse list, got List(Field { name: "item", data_type: UInt8, is_nullable: false, metadata: {} }))
  rr.send_columns(

@teh-cmc
Copy link
Member

teh-cmc commented Aug 13, 2024

Considered adding the above as a snippet, but I'm still not sure if we want to advertise this prominently. There's some situations where this is useful, but generally we don't have a great usecase for it.

I definitely would -- it works, it's useful to people, and it isn't even hackish.

@Wumpf
Copy link
Member Author

Wumpf commented Aug 13, 2024

I definitely would -- it works, it's useful to people, and it isn't even hackish.

I'll bring up another pr today with snippets that integrate into Scalar and Image types at least!

@Wumpf
Copy link
Member Author

Wumpf commented Aug 13, 2024

@Famok thank you so much again for testing the pre-release out! I'll fix the Union issue right away on this PR!
The other error you're getting that is using which code exactly? The snippet I posted in here ofc only works with the changes in this PR (to be landed today), but the snippet @jleibs posted earlier should work both before and after 🤔

@Famok
Copy link

Famok commented Aug 13, 2024

@Wumpf testing this is fun, since rerun is evolving exactly into what I need to replace my current solution.

Btw.: is there a way to use "|" in 3.9? I hoped there was some way like from future import |

I'm using the example from the first post above.

from __future__ import annotations

import numpy as np
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 3

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(
    dtype=np.uint8
)

# Log the ImageFormat and indicator once, as static.
format_static = rr.components.ImageFormat(
    width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8"
)
rr.log("images", [format_static, rr.Image.indicator()], static=True)

# Reshape the images so `ImageBufferBatch` can tell that this is several blobs.
rr.send_columns(
    "images",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))],
)

@Wumpf
Copy link
Member Author

Wumpf commented Aug 13, 2024

Hmmm no the pipe for type concatentation is a feature they added in later Python versions. We usually don't use this for compatibility, but it slipped me through here.

Phew, if you're using the snippet from above then the other error makes sense as well, this is addressed in here but hasn't landed yet 🙂. For trying out that you'd need to checkout this branch and do your own build of the sdk/wheel.
But the "nightly" builder should put out a pre-release version in a few hours (it does so daily bar any issues)

@nikolausWest
Copy link
Member

testing this is fun, since rerun is evolving exactly into what I need to replace my current solution.

@Famok, awesome to hear that, thanks! Would you mind sharing what it is you're working on now that needs where Rerun is going?

@Famok
Copy link

Famok commented Aug 13, 2024

@Wumpf I'll wait for the nightly and then report again

@nikolausWest I'll contact you directly

@Wumpf Wumpf merged commit 781968d into main Aug 13, 2024
34 checks passed
@Wumpf Wumpf deleted the andreas/improve-image-batch-logging branch August 13, 2024 08:44
teh-cmc pushed a commit that referenced this pull request Aug 13, 2024
…entBatch` to `ComponentColumn` in some cases (#7155)

### What

* Fixes  #6592
* technically not for tensors which is covered by it's
union-avoidance-ticket, but this PR adds a note about this
    
Makes it a lot easier to log batches of images in python:
```python
from __future__ import annotations

import numpy as np
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 3

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(
    dtype=np.uint8
)

# Log the ImageFormat and indicator once, as static.
format_static = rr.components.ImageFormat(
    width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8"
)
rr.log("images", [format_static, rr.Image.indicator()], static=True)

# Reshape the images so `ImageBufferBatch` can tell that this is several blobs.
rr.send_columns(
    "images",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))],
)
```
cc: @Famok


Related things that went into this PR:
* add a note to tensors not supporting batching right now
* support arrays of bytes for blob arrays

Manually tested the snippets for `send_columns` to make sure nothing
else broke.
Considered adding the above as a snippet, but I'm still not sure if we
want to advertise this prominently. There's some situations where this
is useful, but generally we don't have a great usecase for it. Not sure
about this 🤔

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7155?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7155?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide

- [PR Build Summary](https://build.rerun.io/pr/7155)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.
@Famok
Copy link

Famok commented Aug 14, 2024

@Wumpf when will the nightly build land here or is it located somewhere else?

@Wumpf
Copy link
Member Author

Wumpf commented Aug 14, 2024

@Famok yes it should override the Development Release. Looks like it had issues yesterday - sometimes they kick off just when we introduced some error that only shows up on nightly. Quite optimistic it will pass today. As of writing it's almost done, see here https://github.com/rerun-io/rerun/actions/runs/10387724727
But you're in luck, there's something better still 😄: We're doing release candidate builds right now for 0.18. Check the links rerun-bot posts in here. There's a build from yesterday and there is another one in flight right now
These RC builds are uploaded to pypi as well, making it easier to grab them :)

@Famok
Copy link

Famok commented Aug 15, 2024

@Wumpf amazing! Imagebatches are working fine now. (Using the example from above)

@Famok
Copy link

Famok commented Aug 15, 2024

@Wumpf I've played with you code and noticed that the image is black when I try this with 16 bit images:

from __future__ import annotations

import numpy as np
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 1

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(
    dtype=np.uint16
)

# Log the ImageFormat and indicator once, as static.
format_static = rr.components.ImageFormat(
    width=WIDTH, height=HEIGHT, color_model="L", channel_datatype="U16"
)
rr.log("images", [format_static, rr.Image.indicator()], static=True)

# Reshape the images so `ImageBufferBatch` can tell that this is several blobs.
rr.send_columns(
    "images",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))],
)

@Wumpf
Copy link
Member Author

Wumpf commented Aug 15, 2024

@Famok I think you're hitting #1510

@Famok
Copy link

Famok commented Aug 15, 2024

@Wumpf sorry I dont get it. #1510 is about the extent of an image right?

@Wumpf
Copy link
Member Author

Wumpf commented Aug 15, 2024

Ah sorry, wrong ticket! This is the right one

in other words: Rerun doesn't allow specifying the data range yet and instead tries to guess it from the look of the datatype, this is done intransparently (https://github.com/rerun-io/rerun/blob/main/crates/viewer/re_viewer_context/src/gpu_bridge/image_to_gpu.rs#L160).

So but for your case everything is 0-255 so that thing should already guess correctly that this is the range.
... but if you run this and check the log you'll find this error:

[2024-08-15T17:11:16Z ERROR re_space_view_spatial::visualizers::utilities::textured_rect] Failed to create texture for "/images": Wrong number of bytes in the texture data buffer - expected 100x50x2=10000, got 5000

The reason why is a little bit hard to spot in the new send_columns api:

components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))],

Image buffers are just raw bytes. So this works fine for u8 images, but for anything else it should be something like this:

        rr.components.ImageBufferBatch(
            np.frombuffer(image_batch.tobytes(), dtype=np.uint8).reshape([COUNT, WIDTH * HEIGHT * CHANNELS * 2])
        )

Or more explicitely maybe:

image_size_in_bytes = WIDTH * HEIGHT * CHANNELS * 2
image_batch_bytes = image_batch.tobytes()
rr.send_columns(
    "images",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[
        rr.components.ImageBufferBatch([
            image_batch_bytes[i : (i + image_size_in_bytes)]
            for i in range(0, len(image_batch_bytes), image_size_in_bytes)
        ])
    ],
)

I'll bring it up with the rest of the team what we should recommend (I don't know what's more efficient in python, afaik these slice operations don't copy though) and if we can make this more obvious somehow 🤔
The last snippet now almost looks like what it looks like in the C++ & Rust which I suppose is good and bad.

Unrelatedly: If possible it would be great if you could bring these questions up on Discord instead in the future. Posting on closed PRs & Issues has a very high risk of getting lost (github notification system is so-so and I get a lot of them) and it's not very visible overall. Thanks!

@Famok
Copy link

Famok commented Aug 16, 2024

I've opened a thread here: discord lets continue discussion there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude from changelog PRs with this won't show up in CHANGELOG.md 🐍 Python API Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All components should support Batch constructors.
4 participants