Skip to content

Commit

Permalink
initial code release
Browse files Browse the repository at this point in the history
  • Loading branch information
lukasHoel committed Mar 21, 2023
1 parent 0facdcf commit 7fda36a
Show file tree
Hide file tree
Showing 52 changed files with 5,777 additions and 0 deletions.
201 changes: 201 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Text2Room
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models.

This is the official repository that contains source code for the arXiv paper [Text2Room](https://lukashoel.github.io/text-to-room/).

[[arXiv](https://lukashoel.github.io/text-to-room/)] [[Project Page](https://lukashoel.github.io/text-to-room/)] [[Video](https://youtu.be/fjRnFL91EZc)]

![Teaser](docs/teaser.jpg "Text2Room")

If you find Text2Room useful for your work please cite:
```
@preprint{hoellein2023text2room,
title={Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models},
author={H{\"o}llein, Lukas and Cao, Ang and Owens, Andrew and Johnson, Justin and Nie{\ss}ner, Matthias},
journal={arXiv preprint},
year={2023}
}
```

## Prepare Environment

Create a conda environment:

```
conda create -n text2room python=3.9
conda activate text2room
pip install -r requirements.txt
```

Then install Pytorch3D by following the [official instructions](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md).
For example, to install Pytorch3D on Linux (tested with PyTorch 1.13.1, CUDA 11.7, Pytorch3D 0.7.2):

```
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
```

Download the pretrained model weights for the fixed depth inpainting model, that we use:

- refer to the [official IronDepth implemention](https://github.com/baegwangbin/IronDepth) to download the files ```normal_scannet.pt``` and ```irondepth_scannet.pt```.
- place the files under ```text2room/checkpoints```

(Optional) Download the pretrained model weights for the text-to-image model:

- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-inpainting```
- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-1```
- ```ln -s <path/to/stable-diffusion-2-inpainting> checkpoints```
- ```ln -s <path/to/stable-diffusion-2-1> checkpoints```

## Generate a Scene

As default, we generate a living room scene:

```python generate_scene.py```

Outputs are stored in ```text2room/output```.

### Generated outputs

We generate the following outputs per generated scene:

```
Mesh Files:
<output_root>/fused_mesh/after_generation.ply: generated mesh after the first stage of our method
<output_root>/fused_mesh/fused_final.ply: generated mesh after the second stage of our method
<output_root>/fused_mesh/x_poisson_meshlab_depth_y.ply: result of applying poisson surface reconstruction on mesh x with depth y
<output_root>/fused_mesh/x_poisson_meshlab_depth_y_quadric_z.ply: result of applying poisson surface reconstruction on mesh x with depth y and then decimating the mesh to have at least z faces
Renderings:
<output_root>/output_rendering/rendering_t.png: image from pose t, that was rendered from the final mesh
<output_root>/output_rendering/rendering_noise_t.png: image from a slightly different/noised pose t, that was rendered from the final mesh
<output_root>/output_depth/depth_t.png: depth from pose t, that was rendered from the final mesh
<output_root>/output_depth/depth_noise_t.png: depth from a slightly different/noised pose t, that was rendered from the final mesh
Metadata:
<output_root>/settings.json: all arguments used to generate the scene
<output_root>/seen_poses.json: list of all poses in Pytorch3D convention used to render output_rendering (no noise)
<output_root>/seen_poses_noise.json: list of all poses in Pytorch3D convention used to render output_rendering (with noise)
<output_root>/transforms.json: a file in the standard NeRF convention (e.g. see NeRFStudio) that can be used to optimize a NeRF for the generated scene. It refers to the rendered images in output_rendering (no noise).
```

We also generate the following intermediate outputs during generation of the scene:

```
<output_root>/fused_mesh/fused_until_frame_t.ply: generated mesh using the content until pose t
<output_root>/rendered/rendered_t.png: image from pose t, that was rendered from mesh_t
<output_root>/mask/mask_t.png: mask from pose t, that signals unobserved regions
<output_root>/mask/mask_eroded_dilated_t.png: mask from pose t, after applying erosion/dilation
<output_root>/rgb/rgb_t.png: image from pose t, that was inpainted with the text-to-image model
<output_root>/depth/rendered_depth_t.png: depth from pose t, that was rendered from mesh_t
<output_root>/depth/depth_t.png: depth from pose t, that was predicted/aligned from rgb_t and rendered_depth_t
<output_root>/rgbd/rgbd_t.png: combination of rgb_t and depth_t placed next to each other
```

### Create a scene from a fixed start-image

Already have an in-the-wild image, from which you want to start the generation?
Specify it as ```--input_image_path``` and the generated scene kicks-off from there.

```python generate_scene.py --input_image_path sample_data/0.png```

### Create a scene from another room type

Generate indoor-scenes of arbitrary rooms by specifying another ```--trajectory_file``` as input:

```python generate_scene.py --trajectory_file model/trajectories/examples/bedroom.json```

We provide a bunch of [example rooms](model/trajectories/examples).

### Customize Generation

We provide a highly configurable method. See [opt.py](model/utils/opt.py) for a complete list of the configuration options.

### Get creative!

You can specify your own prompts and camera trajectories by simply creating your own ```trajectory.json``` file.

#### Trajectory Format

Each ```trajectory.json``` file should satisfy the following format:

```
[
{
"prompt": (str, optional) the prompt to use for this trajectory,
"negative_prompt": (str, optional) the negative prompt to use for this trajectory,
"n_images": (int, optional) how many images to render between start and end pose of this trajectory,
"surface_normal_threshold": (float, optional) the surface_normal_threshold to use for this trajectory
"fn_name": (str, required) the name of a trajectory_function as specified in model/trajectories/trajectory_util.py
"fn_args": (dict, optional) {
"a": value for an argument with name 'a' of fn_name,
"b": value for an argument with name 'b' of fn_name,
},
"adaptive": (list, optional) [
{
"arg": (str, required) name of an argument of fn_name that represents a float value,
"delta": (float, required) delta value to add to the argument during adaptive pose search,
"min": (float, optional) minimum value during search,
"max": (float, optional) maximum value during search
}
]
},
{... next trajectory with similar structure as above ...}
]
```

#### Adding new trajectory functions

We provide a bunch of predefined trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).
Each ```trajectory.json``` file is a combination of the provided trajectory functions.
You can create custom trajectories by creating new combinations of existing functions.
You can also add custom trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).
For automatic integration with our codebase, custom trajectory functions should have the following pattern:

```
def custom_trajectory_fn(current_step, n_steps, **args):
# n_steps: how many poses including start and end pose in this trajectory
# current_step: pose in the current trajectory
# your custom trajectory function here...
def custom_trajectory(**args):
return _config_fn(custom_trajectory_fn, **args)
```

This lets you reference ```custom_trajectory``` as ```fn_name``` in a ```trajectory.json``` file.

## Render an existing scene

We provide a script that renders images from a mesh at different poses:

```python render_cameras.py -m <path/to/mesh.ply> -c <path/to/cameras.json>```

where you can provide any cameras in the Pytorch3D convention via ```-c```.
For example, to re-render all poses used during generation and completion:

```
python render_cameras.py \
-m <output_root>/fused_mesh/fused_final_poisson_meshlab_depth_12.ply \
-c <output_root>/seen_poses.json
```

## Optimize a NeRF

We provide an easy way to train a NeRF from our generated scene.
We save a ```transforms.json``` file in the standard NeRF convention, that can be used to optimize a NeRF for the generated scene.
It refers to the rendered images in ```<output_root>/output_rendering```.
It can be used with standard NeRF frameworks like [Instant-NGP](https://github.com/NVlabs/instant-ngp) or [NeRFStudio](https://github.com/nerfstudio-project/nerfstudio).

## Acknowledgements

Our work builds on top of amazing open-source networks and codebases.
We thank the authors for providing them.

- [IronDepth](https://github.com/baegwangbin/IronDepth) [1]: a method for monocular depth prediction, that can be used for depth inpainting.
- [StableDiffusion](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) [2]: a state-of-the-art text-to-image inpainting model with publicly released network weights.

[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla

[2] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer
Binary file added docs/teaser.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions generate_scene.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import os
import json
from PIL import Image

from model.text2room_pipeline import Text2RoomPipeline
from model.utils.opt import get_default_parser
from model.utils.utils import save_poisson_mesh, generate_first_image

import torch


@torch.no_grad()
def main(args):
# load trajectories
trajectories = json.load(open(args.trajectory_file, "r"))

# check if there is a custom prompt in the first trajectory
# would use it to generate start image, if we have to
if "prompt" in trajectories[0]:
args.prompt = trajectories[0]["prompt"]

# get first image from text prompt or saved image folder
if (not args.input_image_path) or (not os.path.isfile(args.input_image_path)):
first_image_pil = generate_first_image(args)
else:
first_image_pil = Image.open(args.input_image_path)

# load pipeline
pipeline = Text2RoomPipeline(args, first_image_pil=first_image_pil)

# generate using all trajectories
offset = 1 # have the start image already
for t in trajectories:
pipeline.set_trajectory(t)
offset = pipeline.generate_images(offset=offset)

# save outputs before completion
pipeline.clean_mesh()
intermediate_mesh_path = pipeline.save_mesh("after_generation.ply")
save_poisson_mesh(intermediate_mesh_path, depth=args.poisson_depth, max_faces=args.max_faces_for_poisson)

# run completion
pipeline.args.update_mask_after_improvement = True
pipeline.complete_mesh(offset=offset)
pipeline.clean_mesh()

# Now no longer need the models
pipeline.remove_models()

# save outputs after completion
final_mesh_path = pipeline.save_mesh()

# run poisson mesh reconstruction
mesh_poisson_path = save_poisson_mesh(final_mesh_path, depth=args.poisson_depth, max_faces=args.max_faces_for_poisson)

# save additional output
pipeline.save_animations()
pipeline.load_mesh(mesh_poisson_path)
pipeline.save_seen_trajectory_renderings(apply_noise=False, add_to_nerf_images=True)
pipeline.save_nerf_transforms()
pipeline.save_seen_trajectory_renderings(apply_noise=True)

print("Finished. Outputs stored in:", args.out_path)


if __name__ == "__main__":
parser = get_default_parser()
args = parser.parse_args()
main(args)
Empty file added model/__init__.py
Empty file.
Empty file.
39 changes: 39 additions & 0 deletions model/depth_alignment/depth_alignment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import torch


def scale_shift_linear(rendered_depth, predicted_depth, mask, fuse=True):
"""
Optimize a scale and shift parameter in the least squares sense, such that rendered_depth and predicted_depth match.
Formally, solves the following objective:
min || (d * a + b) - d_hat ||
a, b
where d = 1 / predicted_depth, d_hat = 1 / rendered_depth
:param rendered_depth: torch.Tensor (H, W)
:param predicted_depth: torch.Tensor (H, W)
:param mask: torch.Tensor (H, W) - 1: valid points of rendered_depth, 0: invalid points of rendered_depth (ignore)
:param fuse: whether to fuse shifted/scaled predicted_depth with the rendered_depth
:return: scale/shift corrected depth
"""
if mask.sum() == 0:
return predicted_depth

rendered_disparity = 1 / rendered_depth[mask].unsqueeze(-1)
predicted_disparity = 1 / predicted_depth[mask].unsqueeze(-1)

X = torch.cat([predicted_disparity, torch.ones_like(predicted_disparity)], dim=1)
XTX_inv = (X.T @ X).inverse()
XTY = X.T @ rendered_disparity
AB = XTX_inv @ XTY

fixed_disparity = (1 / predicted_depth) * AB[0] + AB[1]
fixed_depth = 1 / fixed_disparity

if fuse:
fused_depth = torch.where(mask, rendered_depth, fixed_depth)
return fused_depth
else:
return fixed_depth
21 changes: 21 additions & 0 deletions model/iron_depth/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Gwangbin Bae

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
6 changes: 6 additions & 0 deletions model/iron_depth/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The following code is largely based on the original code provided under MIT license here:
https://github.com/baegwangbin/IronDepth

We modify the code slightly to perform depth inpainting, according to the proposed method in [1].

[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla
Empty file added model/iron_depth/__init__.py
Empty file.
Loading

0 comments on commit 7fda36a

Please sign in to comment.