Skip to content

Commit

Permalink
more movi documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Qwlouse committed Mar 23, 2022
1 parent 9f31170 commit 9dd4ed1
Show file tree
Hide file tree
Showing 6 changed files with 320 additions and 20 deletions.
259 changes: 253 additions & 6 deletions challenges/movi/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Multi-Object Video (MOVi) datasets

The MOVi dataset is really a series of six datasets (MOVi-A to MOVi-E) with increasing complexity.
The MOVi dataset is really a series of five datasets (MOVi-A to MOVi-E) with increasing complexity.
Each dataset consists of random scenes, each being a 2 second rigid body simulation with a few objects falling.
The variants differ in various dimensions including the number and type of objects, background, camera position/movement, and wether all objects are tossed or if some remain static.

## MOVi-A
## Variants
### MOVi-A
![](images/movi_a_1.gif)
![](images/movi_a_2.gif)
![](images/movi_a_3.gif)
Expand Down Expand Up @@ -100,7 +101,7 @@ ds = tfds.load("movi_a", data_dir="gs://kubric-public/tfds")



## MOVi-B
### MOVi-B
![](images/movi_b_1.gif)
![](images/movi_b_2.gif)
![](images/movi_b_3.gif)
Expand Down Expand Up @@ -230,6 +231,7 @@ ds = tfds.load("movi_c", data_dir="gs://kubric-public/tfds")
"height": 256,
"width": 256
},
"background": str
"camera": {
"field_of_view": 0.85755605,
"focal_length": 35.0,
Expand Down Expand Up @@ -276,8 +278,253 @@ ds = tfds.load("movi_c", data_dir="gs://kubric-public/tfds")

</details>

## MOVi-D
### MOVi-D
![](images/movi_d_1.gif)
![](images/movi_d_2.gif)
![](images/movi_d_3.gif)

MOVi-D slightly increases scene complexity by using more objects with most of them starting the scene lying still on the floor.
Concretely each scene consists of 10-20 static objects and only 1-3 dynamic objects that are tossed onto the scene like in the other datasets.
This variant is meant to test for larger number of objects, and the models ability to handle non-moving objects.

Generate single scene with the [movi_de_worker.py](movi_de_worker.py) script:
```shell
docker run --rm --interactive \
--user $(id -u):$(id -g) \
--volume "$(pwd):/kubric" \
kubricdockerhub/kubruntu \
/usr/bin/python3 challenges/movi/movi_de_worker.py \
--camera=fixed_random
```
See [movi_d.py](movi_d.py) for the TFDS definition / conversion.

``` python
ds = tfds.load("movi_d", data_dir="gs://kubric-public/tfds")
```

<details>
<summary>Sample format and shapes</summary>

``` python
{
"metadata": {
"video_name": int,
"depth_range": (2,),
"forward_flow_range": (2,),
"backward_flow_range": (2,),
"num_frames": 24,
"num_instances": int,
"height": 256,
"width": 256
},
"background": str
"camera": {
"field_of_view": 0.85755605,
"focal_length": 35.0,
"positions": (24, 3),
"quaternions": (24, 4),
"sensor_width": 32.0
},
"instances": {
"angular_velocities": (nr_instances, 24, 3),
"bbox_frames": TensorShape([nr_instances, None]),
"bboxes": TensorShape([nr_instances, None, 4]),
"bboxes_3d": (nr_instances, 24, 8, 3),
"category": (nr_instances,),
"friction": (nr_instances,),
"image_positions": (nr_instances, 24, 2),
"is_dynamic': (nr_instances,)"
"mass": (nr_instances,),
"positions": (nr_instances, 24, 3),
"quaternions": (nr_instances, 24, 4),
"restitution": (nr_instances,),
"scale": (nr_instances,),
"velocities": (nr_instances, 24, 3),
"visibility": (nr_instances, 24)
},

"events": {
"collisions": {
"contact_normal": (2778, 3),
"force": (2778,),
"frame": (2778,),
"image_position": (2778, 2),
"instances": (2778, 2),
"position": (2778, 3)
}
},
"depth": (24, 256, 256, 1),
"forward_flow": (24, 256, 256, 2),
"backward_flow": (24, 256, 256, 2),
"normal": (24, 256, 256, 3),
"object_coordinates": (24, 256, 256, 3),
"segmentations": (24, 256, 256, 1),
"video": (24, 256, 256, 3)
}
```
</details>

### MOVi-E
(coming soon)

## MOVi-E
(coming soon)

## Annotations and Format
Each sample is a dictionary which contains the following video-format data
(`s: sequence length, h: height, w: width`):

- **"video"**: `(s, h, w, 3) [uint8]`
The RGB frames.
- **"segmentations"**: `(s, h, w, 1) [uint8]`
Instance segmentation as per-pixel object-id with `background=0`.
*Note: because of this the instance IDs used here are one higher than their
corresponding index in `sample["instances"]`.*
- **"depth"**: `(s, h, w, 1) [uint16]`
Distance of each pixel from the center of the camera.
*(Note this is different from the z-value sometimes used, which measures the
distance to the camera **plane**.)*
The values are stored as `uint16` and span the range specified in
`sample["metadata"]["depth_range"]`. To convert them back to world-units
use:
```python
minv, maxv = sample["metadata"]["depth_range"]
depth = sample["depth"] / 65535 * (maxv - minv) + minv
```
- **"forward_flow"**: `(s, h, w, 2) [uint16]`
Forward optical flow in the form `(delta_row, delta_column)`.
The values are stored as `uint16` and span the range specified in
`sample["metadata"]["forward_flow_range"]`. To convert them back to pixels use:
```python
minv, maxv = sample["metadata"]["forward_flow_range"]
depth = sample["forward_flow"] / 65535 * (maxv - minv) + minv
```
- **"backward_flow"**: `(s, h, w, 2) [uint16]`
Backward optical flow in the form `(delta_row, delta_column)`.
The values are stored as `uint16` and span the range specified in
`sample["metadata"]["backward_flow_range"]`. To convert them back to pixels use:
```python
minv, maxv = sample["metadata"]["backward_flow_range"]
depth = sample["backward_flow"] / 65535 * (maxv - minv) + minv
```
- **"normal"**: `(s, h, w, 3) [uint16]`
Surface normals for each pixel in world coordinates.
- **"object_coordinates"**: (s, h, w, 3) [uint16]`
Object coordinates encode the position of each point relative to the objects
bounding box (i.e. back-left-top (X=Y=Z=1) corner is white,
while front-right-bottom (X=Y=Z=0) corner is black.)

Additionally there is rich instance-level information within `sample["instances"]`.
*Note that the instances are sorted by their total visibility (descending),
i.e. the total number pixels they occupy throughout the video.*
- **"mass"**: `[float32]`
Mass of the object used for simulation.
- **"friction"**: `[float32]`
Friction coefficient used for simulation.
- **"restitution"**: `[float32]`
Restitution coefficient (bounciness) used for simulation.
- **"positions"**: `(s, 3) [float32]`
Position of the object for each frame in world-coordinates.
- **"quaternions"**: `(s, 4) [float32]`
Rotation of the object for each frame as quaternions.
- **"velocities"**: `(s, 3) [float32]`
Velocity of the object for each frame.
- **"angular_velocities"**: `(s, 3) [float32]`
Angular velocity of the object for each frame.
- **"bboxes_3d"**: `(s, 8, 3) [float32]`
World-space corners of the 3D bounding box around the object.
- **"image_positions"**: `(s, 2) [float32]`
Normalized (0, 1) image-space (2D) coordinates of the center of mass of the
object for each frame.
- **"bboxes"**: `(None, 4) [float32]`
The normalized image-space (2D) coordinates of the bounding box
`[ymin, xmin, ymax, xmax]` for all the frames in which the object is visible
(as specified in bbox_frames).
- **"bbox_frames"**: `(None,) [int]`
A list of all the frames the object is visible.
- **"visibility"**: `(s,) [uint16]`
Visibility of the object in number of pixels for each frame (can be 0).

Information about the camera in `sample["camera"]`
(given for each frame even if the camera is static):

- **"focal_length"**: `[float32]`
Focal length of the camera in millimeters.
- **"sensor_width"**: `[float32]`
Width of the sensor of the camera in millimeters.
- **"field_of_view"**: `[float32]`
(horizontal) field of view angle of the camera in degrees.
- **"positions"**: (s, 3) `[float32]`
Position of the camera for each frame in world-coordinates.
- **"quaternions"**: (s, 4) `[float32]`
Rotation of the camera for each frame as quaternions.


And finally information about collision events in `sample["events"]["collisions"]`:

- **"instances"**: `(2,) [uint16]`
Indices of the two instance between which the collision happened.
Note that collisions with the floor/background objects are marked with 65535
- **"frame"**: `[int32]`
Frame in which the collision happenend.
- **"force"**: `[float32]`
The force (strength) of the collision.
- **"position"**: `(3,) [float32]`
Position of the collision event in 3D world coordinates.
- **"image_position"**: `(2,) [float32]`
Position of the collision event projected onto normalized 2D image coordinates.
- **"contact_normal"**: `(3,) [float32]`
The normal-vector of the contact (direction of the force).


### Variant Specific Additional Information
#### MOVi-A
For each instance MOVi-A contains the following additional information:
- **"shape_label"**:
Choice of `["cube", "cylinder", "sphere"]`
- **"size_label"**:
Choice of `["small", "large"]`
- **"color"**: `(3,) [float32]`
Color of the object in RGB.
- **"color_label"**:
One of `["blue", "brown", "cyan", "gray", "green", "purple", "red", "yellow"]`
- **"material_label"**:
One of `["metal", "rubber"]`

#### MOVi-B
MOVi-B has an additional (top-level) entry:
- **"background_color"**: `(3,) [float32]`
Color of the background in RGB.

And for each instance MOVi-B contains the following additional information:
- **"shape_label"**:
One of `["cube", "cylinder", "sphere", "cone", "torus", "gear", "torus_knot", "sponge", "spot", "teapot", "suzanne"]`
- **"scale"**: `[float32]`
Scale of the object (between 0.7 and 1.4).
- **"color"**: `(3,) [float32]`
Color of the object in RGB.
- **"material_label"**:
One of `["metal", "rubber"]`

#### MOVi-C
MOVi-C has an additional (top-level) entry:
- **"background"**: `str`
Name of the background HDRI.

And for each instance MOVi-C contains the following additional information:
- **"asset_id"**: `str`
Asset id from Google Scanned Objects dataset.
- **"scale"**: `[float32]`
Scale of the object (between 0.75 and 3.0).
- **"category"**:
One of `["Action Figures", "Bag", "Board Games",
"Bottles and Cans and Cups", "Camera",
"Car Seat", "Consumer Goods", "Hat",
"Headphones", "Keyboard", "Legos",
"Media Cases", "Mouse", "None", "Shoe",
"Stuffed Toys", "Toys"]`


#### MOVi-D and MOVi-E
MOVi-D and E have the same information as MOVi-C, but for each instance there is an additional boolean:
- **"is_dynamic"**: `bool`
Indicating whether (at the start of the scene) the object
is sitting on the floor (`False`) or is being tossed (`True`).
Binary file added challenges/movi/images/movi_d_1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added challenges/movi/images/movi_d_2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added challenges/movi/images/movi_d_3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 15 additions & 2 deletions challenges/movi/movi_a.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,21 @@
The normal-vector of the contact (direction of the force).
"""

_CITATION = "TODO: kubric paper"

_CITATION = """\
@inproceedings{greff2022kubric,
title = {Kubric: a scalable dataset generator},
author = {Klaus Greff and Francois Belletti and Lucas Beyer and Carl Doersch and
Yilun Du and Daniel Duckworth and David J Fleet and Dan Gnanapragasam and
Florian Golemo and Charles Herrmann and Thomas Kipf and Abhijit Kundu and
Dmitry Lagun and Issam Laradji and Hsueh-Ti (Derek) Liu and Henning Meyer and
Yishu Miao and Derek Nowrouzezahrai and Cengiz Oztireli and Etienne Pot and
Noha Radwan and Daniel Rebain and Sara Sabour and Mehdi S. M. Sajjadi and Matan Sela and
Vincent Sitzmann and Austin Stone and Deqing Sun and Suhani Vora and Ziyu Wang and
Tianhao Wu and Kwang Moo Yi and Fangcheng Zhong and Andrea Tagliasacchi},
booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}},
year = {2022},
publisher = {Computer Vision Foundation / {IEEE}},
}"""

@dataclasses.dataclass
class MoviAConfig(tfds.core.BuilderConfig):
Expand Down
Loading

0 comments on commit 9dd4ed1

Please sign in to comment.