movi_c tfds definition

google-research · Mar 22, 2022 · 9f31170 · 9f31170
1 parent b9ae217
commit 9f31170
Show file tree

Hide file tree

Showing 30 changed files with 1,516 additions and 663 deletions.
diff --git a/README.md b/README.md
@@ -47,7 +47,7 @@ ls output
 
 
 ## Datasets
-* [Multi-Object Video (MOVi) Dataset](docs/datasets/movi/README.md)
+* [Multi-Object Video (MOVi) Dataset](challenges/movi/README.md)
 * [Texture-Structure in NeRF Dataset](https://github.com/google-research/kubric/issues/184)
 * [Long-Term Tracking](https://github.com/google-research/kubric/issues/184)
 * [Texture-Structure in NeRF](https://github.com/google-research/kubric/issues/184)

diff --git a/challenges/movi/README.md b/challenges/movi/README.md
@@ -0,0 +1,283 @@
+# Multi-Object Video (MOVi) datasets
+
+The MOVi dataset is really a series of six datasets (MOVi-A to MOVi-E) with increasing complexity.
+Each dataset consists of random scenes, each being a 2 second rigid body simulation with a few objects falling.
+The variants differ in various dimensions including the number and type of objects, background, camera position/movement, and wether all objects are tossed or if some remain static.
+
+## MOVi-A
+![](images/movi_a_1.gif)
+![](images/movi_a_2.gif)
+![](images/movi_a_3.gif)
+
+MOVi-A is based on the CLEVR dataset.
+The scene consists of a gray floor, four light sources, a camera, and between
+3 and 10 random objects.
+The camera position is randomly jittered in a small area around a fixed position
+and always points at the origin.
+The objects are randomly chosen from:
+- one three shapes [cube, sphere, cylinder],
+- scaled to one of two sizes [small, large],
+- have one of two materials [rubber, metal],
+- and one of eight colors [blue, brown, cyan, gray, green, purple, red, yellow]
+
+Generate single scene [movi_ab_worker.py](movi_ab_worker.py) script:
+```shell
+docker run --rm --interactive \
+  --user $(id -u):$(id -g)    \
+  --volume "$(pwd):/kubric"   \
+  kubricdockerhub/kubruntu    \
+  /usr/bin/python3 challenges/movi/movi_ab_worker.py \
+  --objects_set=clevr          \
+  --background=clevr          \
+  --camera=clevr
+```
+See [movi_a.py](movi_a.py) for the TFDS definition / conversion.
+``` python
+ds = tfds.load("movi_a", data_dir="gs://kubric-public/tfds") 
+```
+<details>
+  <summary>Sample format and shapes</summary>
+
+``` python
+{
+  "metadata": {
+    "video_name": int,
+    "depth_range": (2,),
+    "forward_flow_range": (2,),
+    "backward_flow_range": (2,),
+    "num_frames": 24,
+    "num_instances": int,
+    "height": 256,
+    "width": 256
+  },
+  "camera": {
+    "field_of_view": 0.85755605,
+    "focal_length": 35.0,
+    "positions": (24, 3),
+    "quaternions": (24, 4),
+    "sensor_width": 32.0
+  },
+  "instances": {
+    "angular_velocities": (nr_instances, 24, 3),
+    "bbox_frames": TensorShape([nr_instances, None]),
+    "bboxes": TensorShape([nr_instances, None, 4]),
+    "bboxes_3d": (nr_instances, 24, nr_instances, 3),
+    "color": (nr_instances, 3),
+    "color_label": (nr_instances,),
+    "friction": (nr_instances,),
+    "image_positions": (nr_instances, 24, 2),
+    "mass": (nr_instances,),
+    "material_label": (nr_instances,),
+    "positions": (nr_instances, 24, 3),
+    "quaternions": (nr_instances, 24, 4),
+    "restitution": (nr_instances,),
+    "shape_label": (nr_instances,),
+    "size_label": (nr_instances,),
+    "velocities": (nr_instances, 24, 3),
+    "visibility": (nr_instances, 24)
+  },
+  "events": {
+    "collisions": {
+      "contact_normal": (2778, 3),
+      "force": (2778,),
+      "frame": (2778,),
+      "image_position": (2778, 2),
+      "instances": (2778, 2),
+      "position": (2778, 3)
+    }
+  },
+  "depth": (24, 256, 256, 1),
+  "forward_flow": (24, 256, 256, 2),
+  "backward_flow": (24, 256, 256, 2),
+  "normal": (24, 256, 256, 3),
+  "object_coordinates": (24, 256, 256, 3),
+  "segmentations": (24, 256, 256, 1),
+  "video": (24, 256, 256, 3)
+}
+ ```
+
+</details>
+
+
+
+## MOVi-B
+![](images/movi_b_1.gif)
+![](images/movi_b_2.gif)
+![](images/movi_b_3.gif)
+
+
+MOVi-B is a straightforward extension of MOVi-A that varies the following dimensions:
+- 8 additional object shapes ["cone", "torus", "gear", "torus_knot", "sponge", "spot", "teapot", "suzanne"]
+- camera is randomly placed in a half-sphere shell looking at the center of the scene
+- hue of the objects is sampled randomly from a uniform distribution
+- scale is sampled uniformly between 0.7 (small) and 1.4 (large)
+- background has random color (uniformly sampled hue)
+
+Generate single scene with the [movi_ab_worker.py](movi_ab_worker.py) script:
+```shell
+docker run --rm --interactive \
+  --user $(id -u):$(id -g)    \
+  --volume "$(pwd):/kubric"   \
+  kubricdockerhub/kubruntu    \
+  /usr/bin/python3 challenges/movi/movi_ab_worker.py \
+  --objects_set=kubasic       \
+  --background=colored        \
+  --camera=random
+```
+See [movi_b.py](movi_b.py) for the TFDS definition / conversion.
+``` python
+ds = tfds.load("movi_b", data_dir="gs://kubric-public/tfds") 
+```
+<details>
+  <summary>Sample format and shapes</summary>
+
+``` python
+{
+  "metadata": {
+    "video_name": int,
+    "depth_range": (2,),
+    "forward_flow_range": (2,),
+    "backward_flow_range": (2,),
+    "num_frames": 24,
+    "num_instances": int,
+    "height": 256,
+    "width": 256,
+    "background_color": (3,)
+  },
+  "camera": {
+    "field_of_view": 0.85755605,
+    "focal_length": 35.0,
+    "positions": (24, 3),
+    "quaternions": (24, 4),
+    "sensor_width": 32.0
+  },
+  "instances": {
+    "angular_velocities": (nr_instances, 24, 3),
+    "bbox_frames": TensorShape([nr_instances, None]),
+    "bboxes": TensorShape([nr_instances, None, 4]),
+    "bboxes_3d": (nr_instances, 24, nr_instances, 3),
+    "color": (nr_instances, 3),
+    "friction": (nr_instances,),
+    "image_positions": (nr_instances, 24, 2),
+    "mass": (nr_instances,),
+    "material_label": (nr_instances,),
+    "positions": (nr_instances, 24, 3),
+    "quaternions": (nr_instances, 24, 4),
+    "restitution": (nr_instances,),
+    "shape_label": (nr_instances,),
+    "scale": (nr_instances,),
+    "velocities": (nr_instances, 24, 3),
+    "visibility": (nr_instances, 24)
+  },
+  "events": {
+    "collisions": {
+      "contact_normal": (2778, 3),
+      "force": (2778,),
+      "frame": (2778,),
+      "image_position": (2778, 2),
+      "instances": (2778, 2),
+      "position": (2778, 3)
+    }
+  },
+  "depth": (24, 256, 256, 1),
+  "forward_flow": (24, 256, 256, 2),
+  "backward_flow": (24, 256, 256, 2),
+  "normal": (24, 256, 256, 3),
+  "object_coordinates": (24, 256, 256, 3),
+  "segmentations": (24, 256, 256, 1),
+  "video": (24, 256, 256, 3)
+}
+ ```
+
+</details>
+
+## MOVi-C
+![](images/movi_c_1.gif)
+![](images/movi_c_2.gif)
+![](images/movi_c_3.gif)
+
+MOVi-C is a big step up in complexity. 
+Instead of simple uniformly colored shapes, it uses realistic, richly textured everyday objects from the [Google Scanned Objects (GSO)](https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/Google%20Scanned%20Objects) dataset.
+Furthermore, the background is replaced by a random HDRI from [Poly Haven](https://polyhaven.com/hdris) that is projected onto a dome and serves as floor, background and lighting simultaneously.
+
+
+Generate single scene with the [movi_c_worker.py](movi_c_worker.py) script:
+```shell
+docker run --rm --interactive \
+  --user $(id -u):$(id -g)    \
+  --volume "$(pwd):/kubric"   \
+  kubricdockerhub/kubruntu    \
+  /usr/bin/python3 challenges/movi/movi_c_worker.py \
+  --camera=fixed_random
+```
+See [movi_c.py](movi_c.py) for the TFDS definition / conversion.
+
+``` python
+ds = tfds.load("movi_c", data_dir="gs://kubric-public/tfds") 
+```
+<details>
+  <summary>Sample format and shapes</summary>
+
+``` python
+{
+  "metadata": {
+    "video_name": int,
+    "depth_range": (2,),
+    "forward_flow_range": (2,),
+    "backward_flow_range": (2,),
+    "num_frames": 24,
+    "num_instances": int,
+    "height": 256,
+    "width": 256
+  },
+  "camera": {
+    "field_of_view": 0.85755605,
+    "focal_length": 35.0,
+    "positions": (24, 3),
+    "quaternions": (24, 4),
+    "sensor_width": 32.0
+  },
+  "instances": {
+    "angular_velocities": (nr_instances, 24, 3),
+    "bbox_frames": TensorShape([nr_instances, None]),
+    "bboxes": TensorShape([nr_instances, None, 4]),
+    "bboxes_3d": (nr_instances, 24, 8, 3),
+    "category": (nr_instances,),
+    "friction": (nr_instances,),
+    "image_positions": (nr_instances, 24, 2),
+    "mass": (nr_instances,),
+    "positions": (nr_instances, 24, 3),
+    "quaternions": (nr_instances, 24, 4),
+    "restitution": (nr_instances,),
+    "scale": (nr_instances,),
+    "velocities": (nr_instances, 24, 3),
+    "visibility": (nr_instances, 24)
+  },
+
+  "events": {
+    "collisions": {
+      "contact_normal": (2778, 3),
+      "force": (2778,),
+      "frame": (2778,),
+      "image_position": (2778, 2),
+      "instances": (2778, 2),
+      "position": (2778, 3)
+    }
+  },
+  "depth": (24, 256, 256, 1),
+  "forward_flow": (24, 256, 256, 2),
+  "backward_flow": (24, 256, 256, 2),
+  "normal": (24, 256, 256, 3),
+  "object_coordinates": (24, 256, 256, 3),
+  "segmentations": (24, 256, 256, 1),
+  "video": (24, 256, 256, 3)
+}
+ ```
+
+</details>
+
+## MOVi-D
+(coming soon)
+
+## MOVi-E
+(coming soon)
diff --git a/examples/movi/__init__.py → challenges/movi/__init__.py b/examples/movi/__init__.py → challenges/movi/__init__.py
diff --git a/build_tfds.sh → challenges/movi/build_tfds.sh b/build_tfds.sh → challenges/movi/build_tfds.sh
@@ -1,7 +1,7 @@
 #!/bin/bash -x
 
 DATASET_NAME=${1}  # has to be the same as the filename of DATASET_CONFIG
-DATASET_CONFIG="examples/movi/${DATASET_NAME}.py"
+DATASET_CONFIG="${DATASET_NAME}.py"
 GCP_PROJECT=kubric-xgcp
 GCS_BUCKET=gs://research-brain-kubric-xgcp
 REGION=us-central1

diff --git a/docs/datasets/movi/movi_a_1.gif → challenges/movi/images/movi_a_1.gif b/docs/datasets/movi/movi_a_1.gif → challenges/movi/images/movi_a_1.gif
diff --git a/docs/datasets/movi/movi_a_2.gif → challenges/movi/images/movi_a_2.gif b/docs/datasets/movi/movi_a_2.gif → challenges/movi/images/movi_a_2.gif
diff --git a/docs/datasets/movi/movi_a_3.gif → challenges/movi/images/movi_a_3.gif b/docs/datasets/movi/movi_a_3.gif → challenges/movi/images/movi_a_3.gif
diff --git a/docs/datasets/movi/movi_b_1.gif → challenges/movi/images/movi_b_1.gif b/docs/datasets/movi/movi_b_1.gif → challenges/movi/images/movi_b_1.gif
diff --git a/docs/datasets/movi/movi_b_2.gif → challenges/movi/images/movi_b_2.gif b/docs/datasets/movi/movi_b_2.gif → challenges/movi/images/movi_b_2.gif
diff --git a/docs/datasets/movi/movi_b_3.gif → challenges/movi/images/movi_b_3.gif b/docs/datasets/movi/movi_b_3.gif → challenges/movi/images/movi_b_3.gif
diff --git a/challenges/movi/images/movi_c_1.gif b/challenges/movi/images/movi_c_1.gif
diff --git a/challenges/movi/images/movi_c_2.gif b/challenges/movi/images/movi_c_2.gif
diff --git a/challenges/movi/images/movi_c_3.gif b/challenges/movi/images/movi_c_3.gif
diff --git a/examples/movi/movi_a.py → challenges/movi/movi_a.py b/examples/movi/movi_a.py → challenges/movi/movi_a.py
@@ -154,7 +154,7 @@
 
 @dataclasses.dataclass
 class MoviAConfig(tfds.core.BuilderConfig):
-  """"Configuration for Multi-Object Video (MOVid) dataset."""
+  """"Configuration for Multi-Object Video (MOVi) dataset."""
   height: int = 256
   width: int = 256
   num_frames: int = 24
@@ -180,7 +180,6 @@ class MoviA(tfds.core.BeamBasedBuilder):
           # train_val_path="/usr/local/google/home/klausg/movi_tmp",
           train_val_path="gs://research-brain-kubric-xgcp/jobs/movi_a_regen_10k/",
           test_split_paths={
-              # "test_all_same": "gs://research-brain-kubric-xgcp/jobs/movid_a_regen_all_same",
           }
       ),
       MoviAConfig(
@@ -192,7 +191,6 @@ class MoviA(tfds.core.BeamBasedBuilder):
           # train_val_path="/usr/local/google/home/klausg/movi_tmp",
           train_val_path="gs://research-brain-kubric-xgcp/jobs/movi_a_regen_10k/",
           test_split_paths={
-              # "test_all_same": "gs://research-brain-kubric-xgcp/jobs/movid_a_regen_all_same",
           }
       ),
   ]
@@ -204,7 +202,7 @@ def _info(self) -> tfds.core.DatasetInfo:
     w = self.builder_config.width
     s = self.builder_config.num_frames
 
-    def get_movid_a_instance_features(seq_length: int):
+    def get_movi_a_instance_features(seq_length: int):
       features = get_instance_features(seq_length)
       features.update({
           "shape_label": tfds.features.ClassLabel(
@@ -239,7 +237,7 @@ def get_movid_a_instance_features(seq_length: int):
                                                             dtype=tf.float32),
             },
             "instances": tfds.features.Sequence(
-                feature=get_movid_a_instance_features(seq_length=s)),
+                feature=get_movi_a_instance_features(seq_length=s)),
             "camera": get_camera_features(s),
             "events": get_events_features(),
             # -----
@@ -307,7 +305,7 @@ def _generate_examples(self, directories: List[str]):
     def _process_example(video_dir):
       key, result, metadata = load_scene_directory(video_dir, target_size)
 
-      # add MOVid-A specific instance information:
+      # add MOVi-A specific instance information:
       for i, obj in enumerate(result["instances"]):
         obj["shape_label"] = metadata["instances"][i]["shape"]
         obj["size_label"] = metadata["instances"][i]["size_label"]

diff --git a/examples/movi/movi_ab_worker.py → challenges/movi/movi_ab_worker.py b/examples/movi/movi_ab_worker.py → challenges/movi/movi_ab_worker.py