Skip to content

Commit

Permalink
movi_c tfds definition
Browse files Browse the repository at this point in the history
  • Loading branch information
Qwlouse committed Mar 22, 2022
1 parent b9ae217 commit 9f31170
Show file tree
Hide file tree
Showing 30 changed files with 1,516 additions and 663 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ ls output


## Datasets
* [Multi-Object Video (MOVi) Dataset](docs/datasets/movi/README.md)
* [Multi-Object Video (MOVi) Dataset](challenges/movi/README.md)
* [Texture-Structure in NeRF Dataset](https://github.com/google-research/kubric/issues/184)
* [Long-Term Tracking](https://github.com/google-research/kubric/issues/184)
* [Texture-Structure in NeRF](https://github.com/google-research/kubric/issues/184)
Expand Down
283 changes: 283 additions & 0 deletions challenges/movi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
# Multi-Object Video (MOVi) datasets

The MOVi dataset is really a series of six datasets (MOVi-A to MOVi-E) with increasing complexity.
Each dataset consists of random scenes, each being a 2 second rigid body simulation with a few objects falling.
The variants differ in various dimensions including the number and type of objects, background, camera position/movement, and wether all objects are tossed or if some remain static.

## MOVi-A
![](images/movi_a_1.gif)
![](images/movi_a_2.gif)
![](images/movi_a_3.gif)

MOVi-A is based on the CLEVR dataset.
The scene consists of a gray floor, four light sources, a camera, and between
3 and 10 random objects.
The camera position is randomly jittered in a small area around a fixed position
and always points at the origin.
The objects are randomly chosen from:
- one three shapes [cube, sphere, cylinder],
- scaled to one of two sizes [small, large],
- have one of two materials [rubber, metal],
- and one of eight colors [blue, brown, cyan, gray, green, purple, red, yellow]

Generate single scene [movi_ab_worker.py](movi_ab_worker.py) script:
```shell
docker run --rm --interactive \
--user $(id -u):$(id -g) \
--volume "$(pwd):/kubric" \
kubricdockerhub/kubruntu \
/usr/bin/python3 challenges/movi/movi_ab_worker.py \
--objects_set=clevr \
--background=clevr \
--camera=clevr
```
See [movi_a.py](movi_a.py) for the TFDS definition / conversion.
``` python
ds = tfds.load("movi_a", data_dir="gs://kubric-public/tfds")
```
<details>
<summary>Sample format and shapes</summary>

``` python
{
"metadata": {
"video_name": int,
"depth_range": (2,),
"forward_flow_range": (2,),
"backward_flow_range": (2,),
"num_frames": 24,
"num_instances": int,
"height": 256,
"width": 256
},
"camera": {
"field_of_view": 0.85755605,
"focal_length": 35.0,
"positions": (24, 3),
"quaternions": (24, 4),
"sensor_width": 32.0
},
"instances": {
"angular_velocities": (nr_instances, 24, 3),
"bbox_frames": TensorShape([nr_instances, None]),
"bboxes": TensorShape([nr_instances, None, 4]),
"bboxes_3d": (nr_instances, 24, nr_instances, 3),
"color": (nr_instances, 3),
"color_label": (nr_instances,),
"friction": (nr_instances,),
"image_positions": (nr_instances, 24, 2),
"mass": (nr_instances,),
"material_label": (nr_instances,),
"positions": (nr_instances, 24, 3),
"quaternions": (nr_instances, 24, 4),
"restitution": (nr_instances,),
"shape_label": (nr_instances,),
"size_label": (nr_instances,),
"velocities": (nr_instances, 24, 3),
"visibility": (nr_instances, 24)
},
"events": {
"collisions": {
"contact_normal": (2778, 3),
"force": (2778,),
"frame": (2778,),
"image_position": (2778, 2),
"instances": (2778, 2),
"position": (2778, 3)
}
},
"depth": (24, 256, 256, 1),
"forward_flow": (24, 256, 256, 2),
"backward_flow": (24, 256, 256, 2),
"normal": (24, 256, 256, 3),
"object_coordinates": (24, 256, 256, 3),
"segmentations": (24, 256, 256, 1),
"video": (24, 256, 256, 3)
}
```

</details>



## MOVi-B
![](images/movi_b_1.gif)
![](images/movi_b_2.gif)
![](images/movi_b_3.gif)


MOVi-B is a straightforward extension of MOVi-A that varies the following dimensions:
- 8 additional object shapes ["cone", "torus", "gear", "torus_knot", "sponge", "spot", "teapot", "suzanne"]
- camera is randomly placed in a half-sphere shell looking at the center of the scene
- hue of the objects is sampled randomly from a uniform distribution
- scale is sampled uniformly between 0.7 (small) and 1.4 (large)
- background has random color (uniformly sampled hue)

Generate single scene with the [movi_ab_worker.py](movi_ab_worker.py) script:
```shell
docker run --rm --interactive \
--user $(id -u):$(id -g) \
--volume "$(pwd):/kubric" \
kubricdockerhub/kubruntu \
/usr/bin/python3 challenges/movi/movi_ab_worker.py \
--objects_set=kubasic \
--background=colored \
--camera=random
```
See [movi_b.py](movi_b.py) for the TFDS definition / conversion.
``` python
ds = tfds.load("movi_b", data_dir="gs://kubric-public/tfds")
```
<details>
<summary>Sample format and shapes</summary>

``` python
{
"metadata": {
"video_name": int,
"depth_range": (2,),
"forward_flow_range": (2,),
"backward_flow_range": (2,),
"num_frames": 24,
"num_instances": int,
"height": 256,
"width": 256,
"background_color": (3,)
},
"camera": {
"field_of_view": 0.85755605,
"focal_length": 35.0,
"positions": (24, 3),
"quaternions": (24, 4),
"sensor_width": 32.0
},
"instances": {
"angular_velocities": (nr_instances, 24, 3),
"bbox_frames": TensorShape([nr_instances, None]),
"bboxes": TensorShape([nr_instances, None, 4]),
"bboxes_3d": (nr_instances, 24, nr_instances, 3),
"color": (nr_instances, 3),
"friction": (nr_instances,),
"image_positions": (nr_instances, 24, 2),
"mass": (nr_instances,),
"material_label": (nr_instances,),
"positions": (nr_instances, 24, 3),
"quaternions": (nr_instances, 24, 4),
"restitution": (nr_instances,),
"shape_label": (nr_instances,),
"scale": (nr_instances,),
"velocities": (nr_instances, 24, 3),
"visibility": (nr_instances, 24)
},
"events": {
"collisions": {
"contact_normal": (2778, 3),
"force": (2778,),
"frame": (2778,),
"image_position": (2778, 2),
"instances": (2778, 2),
"position": (2778, 3)
}
},
"depth": (24, 256, 256, 1),
"forward_flow": (24, 256, 256, 2),
"backward_flow": (24, 256, 256, 2),
"normal": (24, 256, 256, 3),
"object_coordinates": (24, 256, 256, 3),
"segmentations": (24, 256, 256, 1),
"video": (24, 256, 256, 3)
}
```

</details>

## MOVi-C
![](images/movi_c_1.gif)
![](images/movi_c_2.gif)
![](images/movi_c_3.gif)

MOVi-C is a big step up in complexity.
Instead of simple uniformly colored shapes, it uses realistic, richly textured everyday objects from the [Google Scanned Objects (GSO)](https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/Google%20Scanned%20Objects) dataset.
Furthermore, the background is replaced by a random HDRI from [Poly Haven](https://polyhaven.com/hdris) that is projected onto a dome and serves as floor, background and lighting simultaneously.


Generate single scene with the [movi_c_worker.py](movi_c_worker.py) script:
```shell
docker run --rm --interactive \
--user $(id -u):$(id -g) \
--volume "$(pwd):/kubric" \
kubricdockerhub/kubruntu \
/usr/bin/python3 challenges/movi/movi_c_worker.py \
--camera=fixed_random
```
See [movi_c.py](movi_c.py) for the TFDS definition / conversion.

``` python
ds = tfds.load("movi_c", data_dir="gs://kubric-public/tfds")
```
<details>
<summary>Sample format and shapes</summary>

``` python
{
"metadata": {
"video_name": int,
"depth_range": (2,),
"forward_flow_range": (2,),
"backward_flow_range": (2,),
"num_frames": 24,
"num_instances": int,
"height": 256,
"width": 256
},
"camera": {
"field_of_view": 0.85755605,
"focal_length": 35.0,
"positions": (24, 3),
"quaternions": (24, 4),
"sensor_width": 32.0
},
"instances": {
"angular_velocities": (nr_instances, 24, 3),
"bbox_frames": TensorShape([nr_instances, None]),
"bboxes": TensorShape([nr_instances, None, 4]),
"bboxes_3d": (nr_instances, 24, 8, 3),
"category": (nr_instances,),
"friction": (nr_instances,),
"image_positions": (nr_instances, 24, 2),
"mass": (nr_instances,),
"positions": (nr_instances, 24, 3),
"quaternions": (nr_instances, 24, 4),
"restitution": (nr_instances,),
"scale": (nr_instances,),
"velocities": (nr_instances, 24, 3),
"visibility": (nr_instances, 24)
},

"events": {
"collisions": {
"contact_normal": (2778, 3),
"force": (2778,),
"frame": (2778,),
"image_position": (2778, 2),
"instances": (2778, 2),
"position": (2778, 3)
}
},
"depth": (24, 256, 256, 1),
"forward_flow": (24, 256, 256, 2),
"backward_flow": (24, 256, 256, 2),
"normal": (24, 256, 256, 3),
"object_coordinates": (24, 256, 256, 3),
"segmentations": (24, 256, 256, 1),
"video": (24, 256, 256, 3)
}
```

</details>

## MOVi-D
(coming soon)

## MOVi-E
(coming soon)
File renamed without changes.
2 changes: 1 addition & 1 deletion build_tfds.sh → challenges/movi/build_tfds.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash -x

DATASET_NAME=${1} # has to be the same as the filename of DATASET_CONFIG
DATASET_CONFIG="examples/movi/${DATASET_NAME}.py"
DATASET_CONFIG="${DATASET_NAME}.py"
GCP_PROJECT=kubric-xgcp
GCS_BUCKET=gs://research-brain-kubric-xgcp
REGION=us-central1
Expand Down
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added challenges/movi/images/movi_c_1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added challenges/movi/images/movi_c_2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added challenges/movi/images/movi_c_3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 4 additions & 6 deletions examples/movi/movi_a.py → challenges/movi/movi_a.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@

@dataclasses.dataclass
class MoviAConfig(tfds.core.BuilderConfig):
""""Configuration for Multi-Object Video (MOVid) dataset."""
""""Configuration for Multi-Object Video (MOVi) dataset."""
height: int = 256
width: int = 256
num_frames: int = 24
Expand All @@ -180,7 +180,6 @@ class MoviA(tfds.core.BeamBasedBuilder):
# train_val_path="/usr/local/google/home/klausg/movi_tmp",
train_val_path="gs://research-brain-kubric-xgcp/jobs/movi_a_regen_10k/",
test_split_paths={
# "test_all_same": "gs://research-brain-kubric-xgcp/jobs/movid_a_regen_all_same",
}
),
MoviAConfig(
Expand All @@ -192,7 +191,6 @@ class MoviA(tfds.core.BeamBasedBuilder):
# train_val_path="/usr/local/google/home/klausg/movi_tmp",
train_val_path="gs://research-brain-kubric-xgcp/jobs/movi_a_regen_10k/",
test_split_paths={
# "test_all_same": "gs://research-brain-kubric-xgcp/jobs/movid_a_regen_all_same",
}
),
]
Expand All @@ -204,7 +202,7 @@ def _info(self) -> tfds.core.DatasetInfo:
w = self.builder_config.width
s = self.builder_config.num_frames

def get_movid_a_instance_features(seq_length: int):
def get_movi_a_instance_features(seq_length: int):
features = get_instance_features(seq_length)
features.update({
"shape_label": tfds.features.ClassLabel(
Expand Down Expand Up @@ -239,7 +237,7 @@ def get_movid_a_instance_features(seq_length: int):
dtype=tf.float32),
},
"instances": tfds.features.Sequence(
feature=get_movid_a_instance_features(seq_length=s)),
feature=get_movi_a_instance_features(seq_length=s)),
"camera": get_camera_features(s),
"events": get_events_features(),
# -----
Expand Down Expand Up @@ -307,7 +305,7 @@ def _generate_examples(self, directories: List[str]):
def _process_example(video_dir):
key, result, metadata = load_scene_directory(video_dir, target_size)

# add MOVid-A specific instance information:
# add MOVi-A specific instance information:
for i, obj in enumerate(result["instances"]):
obj["shape_label"] = metadata["instances"][i]["shape"]
obj["size_label"] = metadata["instances"][i]["size_label"]
Expand Down
File renamed without changes.
Loading

0 comments on commit 9f31170

Please sign in to comment.