-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
52 changed files
with
5,777 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
# Text2Room | ||
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models. | ||
|
||
This is the official repository that contains source code for the arXiv paper [Text2Room](https://lukashoel.github.io/text-to-room/). | ||
|
||
[[arXiv](https://lukashoel.github.io/text-to-room/)] [[Project Page](https://lukashoel.github.io/text-to-room/)] [[Video](https://youtu.be/fjRnFL91EZc)] | ||
|
||
![Teaser](docs/teaser.jpg "Text2Room") | ||
|
||
If you find Text2Room useful for your work please cite: | ||
``` | ||
@preprint{hoellein2023text2room, | ||
title={Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models}, | ||
author={H{\"o}llein, Lukas and Cao, Ang and Owens, Andrew and Johnson, Justin and Nie{\ss}ner, Matthias}, | ||
journal={arXiv preprint}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
## Prepare Environment | ||
|
||
Create a conda environment: | ||
|
||
``` | ||
conda create -n text2room python=3.9 | ||
conda activate text2room | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Then install Pytorch3D by following the [official instructions](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md). | ||
For example, to install Pytorch3D on Linux (tested with PyTorch 1.13.1, CUDA 11.7, Pytorch3D 0.7.2): | ||
|
||
``` | ||
conda install -c fvcore -c iopath -c conda-forge fvcore iopath | ||
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" | ||
``` | ||
|
||
Download the pretrained model weights for the fixed depth inpainting model, that we use: | ||
|
||
- refer to the [official IronDepth implemention](https://github.com/baegwangbin/IronDepth) to download the files ```normal_scannet.pt``` and ```irondepth_scannet.pt```. | ||
- place the files under ```text2room/checkpoints``` | ||
|
||
(Optional) Download the pretrained model weights for the text-to-image model: | ||
|
||
- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-inpainting``` | ||
- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-1``` | ||
- ```ln -s <path/to/stable-diffusion-2-inpainting> checkpoints``` | ||
- ```ln -s <path/to/stable-diffusion-2-1> checkpoints``` | ||
|
||
## Generate a Scene | ||
|
||
As default, we generate a living room scene: | ||
|
||
```python generate_scene.py``` | ||
|
||
Outputs are stored in ```text2room/output```. | ||
|
||
### Generated outputs | ||
|
||
We generate the following outputs per generated scene: | ||
|
||
``` | ||
Mesh Files: | ||
<output_root>/fused_mesh/after_generation.ply: generated mesh after the first stage of our method | ||
<output_root>/fused_mesh/fused_final.ply: generated mesh after the second stage of our method | ||
<output_root>/fused_mesh/x_poisson_meshlab_depth_y.ply: result of applying poisson surface reconstruction on mesh x with depth y | ||
<output_root>/fused_mesh/x_poisson_meshlab_depth_y_quadric_z.ply: result of applying poisson surface reconstruction on mesh x with depth y and then decimating the mesh to have at least z faces | ||
Renderings: | ||
<output_root>/output_rendering/rendering_t.png: image from pose t, that was rendered from the final mesh | ||
<output_root>/output_rendering/rendering_noise_t.png: image from a slightly different/noised pose t, that was rendered from the final mesh | ||
<output_root>/output_depth/depth_t.png: depth from pose t, that was rendered from the final mesh | ||
<output_root>/output_depth/depth_noise_t.png: depth from a slightly different/noised pose t, that was rendered from the final mesh | ||
Metadata: | ||
<output_root>/settings.json: all arguments used to generate the scene | ||
<output_root>/seen_poses.json: list of all poses in Pytorch3D convention used to render output_rendering (no noise) | ||
<output_root>/seen_poses_noise.json: list of all poses in Pytorch3D convention used to render output_rendering (with noise) | ||
<output_root>/transforms.json: a file in the standard NeRF convention (e.g. see NeRFStudio) that can be used to optimize a NeRF for the generated scene. It refers to the rendered images in output_rendering (no noise). | ||
``` | ||
|
||
We also generate the following intermediate outputs during generation of the scene: | ||
|
||
``` | ||
<output_root>/fused_mesh/fused_until_frame_t.ply: generated mesh using the content until pose t | ||
<output_root>/rendered/rendered_t.png: image from pose t, that was rendered from mesh_t | ||
<output_root>/mask/mask_t.png: mask from pose t, that signals unobserved regions | ||
<output_root>/mask/mask_eroded_dilated_t.png: mask from pose t, after applying erosion/dilation | ||
<output_root>/rgb/rgb_t.png: image from pose t, that was inpainted with the text-to-image model | ||
<output_root>/depth/rendered_depth_t.png: depth from pose t, that was rendered from mesh_t | ||
<output_root>/depth/depth_t.png: depth from pose t, that was predicted/aligned from rgb_t and rendered_depth_t | ||
<output_root>/rgbd/rgbd_t.png: combination of rgb_t and depth_t placed next to each other | ||
``` | ||
|
||
### Create a scene from a fixed start-image | ||
|
||
Already have an in-the-wild image, from which you want to start the generation? | ||
Specify it as ```--input_image_path``` and the generated scene kicks-off from there. | ||
|
||
```python generate_scene.py --input_image_path sample_data/0.png``` | ||
|
||
### Create a scene from another room type | ||
|
||
Generate indoor-scenes of arbitrary rooms by specifying another ```--trajectory_file``` as input: | ||
|
||
```python generate_scene.py --trajectory_file model/trajectories/examples/bedroom.json``` | ||
|
||
We provide a bunch of [example rooms](model/trajectories/examples). | ||
|
||
### Customize Generation | ||
|
||
We provide a highly configurable method. See [opt.py](model/utils/opt.py) for a complete list of the configuration options. | ||
|
||
### Get creative! | ||
|
||
You can specify your own prompts and camera trajectories by simply creating your own ```trajectory.json``` file. | ||
|
||
#### Trajectory Format | ||
|
||
Each ```trajectory.json``` file should satisfy the following format: | ||
|
||
``` | ||
[ | ||
{ | ||
"prompt": (str, optional) the prompt to use for this trajectory, | ||
"negative_prompt": (str, optional) the negative prompt to use for this trajectory, | ||
"n_images": (int, optional) how many images to render between start and end pose of this trajectory, | ||
"surface_normal_threshold": (float, optional) the surface_normal_threshold to use for this trajectory | ||
"fn_name": (str, required) the name of a trajectory_function as specified in model/trajectories/trajectory_util.py | ||
"fn_args": (dict, optional) { | ||
"a": value for an argument with name 'a' of fn_name, | ||
"b": value for an argument with name 'b' of fn_name, | ||
}, | ||
"adaptive": (list, optional) [ | ||
{ | ||
"arg": (str, required) name of an argument of fn_name that represents a float value, | ||
"delta": (float, required) delta value to add to the argument during adaptive pose search, | ||
"min": (float, optional) minimum value during search, | ||
"max": (float, optional) maximum value during search | ||
} | ||
] | ||
}, | ||
{... next trajectory with similar structure as above ...} | ||
] | ||
``` | ||
|
||
#### Adding new trajectory functions | ||
|
||
We provide a bunch of predefined trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py). | ||
Each ```trajectory.json``` file is a combination of the provided trajectory functions. | ||
You can create custom trajectories by creating new combinations of existing functions. | ||
You can also add custom trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py). | ||
For automatic integration with our codebase, custom trajectory functions should have the following pattern: | ||
|
||
``` | ||
def custom_trajectory_fn(current_step, n_steps, **args): | ||
# n_steps: how many poses including start and end pose in this trajectory | ||
# current_step: pose in the current trajectory | ||
# your custom trajectory function here... | ||
def custom_trajectory(**args): | ||
return _config_fn(custom_trajectory_fn, **args) | ||
``` | ||
|
||
This lets you reference ```custom_trajectory``` as ```fn_name``` in a ```trajectory.json``` file. | ||
|
||
## Render an existing scene | ||
|
||
We provide a script that renders images from a mesh at different poses: | ||
|
||
```python render_cameras.py -m <path/to/mesh.ply> -c <path/to/cameras.json>``` | ||
|
||
where you can provide any cameras in the Pytorch3D convention via ```-c```. | ||
For example, to re-render all poses used during generation and completion: | ||
|
||
``` | ||
python render_cameras.py \ | ||
-m <output_root>/fused_mesh/fused_final_poisson_meshlab_depth_12.ply \ | ||
-c <output_root>/seen_poses.json | ||
``` | ||
|
||
## Optimize a NeRF | ||
|
||
We provide an easy way to train a NeRF from our generated scene. | ||
We save a ```transforms.json``` file in the standard NeRF convention, that can be used to optimize a NeRF for the generated scene. | ||
It refers to the rendered images in ```<output_root>/output_rendering```. | ||
It can be used with standard NeRF frameworks like [Instant-NGP](https://github.com/NVlabs/instant-ngp) or [NeRFStudio](https://github.com/nerfstudio-project/nerfstudio). | ||
|
||
## Acknowledgements | ||
|
||
Our work builds on top of amazing open-source networks and codebases. | ||
We thank the authors for providing them. | ||
|
||
- [IronDepth](https://github.com/baegwangbin/IronDepth) [1]: a method for monocular depth prediction, that can be used for depth inpainting. | ||
- [StableDiffusion](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) [2]: a state-of-the-art text-to-image inpainting model with publicly released network weights. | ||
|
||
[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla | ||
|
||
[2] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
import os | ||
import json | ||
from PIL import Image | ||
|
||
from model.text2room_pipeline import Text2RoomPipeline | ||
from model.utils.opt import get_default_parser | ||
from model.utils.utils import save_poisson_mesh, generate_first_image | ||
|
||
import torch | ||
|
||
|
||
@torch.no_grad() | ||
def main(args): | ||
# load trajectories | ||
trajectories = json.load(open(args.trajectory_file, "r")) | ||
|
||
# check if there is a custom prompt in the first trajectory | ||
# would use it to generate start image, if we have to | ||
if "prompt" in trajectories[0]: | ||
args.prompt = trajectories[0]["prompt"] | ||
|
||
# get first image from text prompt or saved image folder | ||
if (not args.input_image_path) or (not os.path.isfile(args.input_image_path)): | ||
first_image_pil = generate_first_image(args) | ||
else: | ||
first_image_pil = Image.open(args.input_image_path) | ||
|
||
# load pipeline | ||
pipeline = Text2RoomPipeline(args, first_image_pil=first_image_pil) | ||
|
||
# generate using all trajectories | ||
offset = 1 # have the start image already | ||
for t in trajectories: | ||
pipeline.set_trajectory(t) | ||
offset = pipeline.generate_images(offset=offset) | ||
|
||
# save outputs before completion | ||
pipeline.clean_mesh() | ||
intermediate_mesh_path = pipeline.save_mesh("after_generation.ply") | ||
save_poisson_mesh(intermediate_mesh_path, depth=args.poisson_depth, max_faces=args.max_faces_for_poisson) | ||
|
||
# run completion | ||
pipeline.args.update_mask_after_improvement = True | ||
pipeline.complete_mesh(offset=offset) | ||
pipeline.clean_mesh() | ||
|
||
# Now no longer need the models | ||
pipeline.remove_models() | ||
|
||
# save outputs after completion | ||
final_mesh_path = pipeline.save_mesh() | ||
|
||
# run poisson mesh reconstruction | ||
mesh_poisson_path = save_poisson_mesh(final_mesh_path, depth=args.poisson_depth, max_faces=args.max_faces_for_poisson) | ||
|
||
# save additional output | ||
pipeline.save_animations() | ||
pipeline.load_mesh(mesh_poisson_path) | ||
pipeline.save_seen_trajectory_renderings(apply_noise=False, add_to_nerf_images=True) | ||
pipeline.save_nerf_transforms() | ||
pipeline.save_seen_trajectory_renderings(apply_noise=True) | ||
|
||
print("Finished. Outputs stored in:", args.out_path) | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = get_default_parser() | ||
args = parser.parse_args() | ||
main(args) |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import torch | ||
|
||
|
||
def scale_shift_linear(rendered_depth, predicted_depth, mask, fuse=True): | ||
""" | ||
Optimize a scale and shift parameter in the least squares sense, such that rendered_depth and predicted_depth match. | ||
Formally, solves the following objective: | ||
min || (d * a + b) - d_hat || | ||
a, b | ||
where d = 1 / predicted_depth, d_hat = 1 / rendered_depth | ||
:param rendered_depth: torch.Tensor (H, W) | ||
:param predicted_depth: torch.Tensor (H, W) | ||
:param mask: torch.Tensor (H, W) - 1: valid points of rendered_depth, 0: invalid points of rendered_depth (ignore) | ||
:param fuse: whether to fuse shifted/scaled predicted_depth with the rendered_depth | ||
:return: scale/shift corrected depth | ||
""" | ||
if mask.sum() == 0: | ||
return predicted_depth | ||
|
||
rendered_disparity = 1 / rendered_depth[mask].unsqueeze(-1) | ||
predicted_disparity = 1 / predicted_depth[mask].unsqueeze(-1) | ||
|
||
X = torch.cat([predicted_disparity, torch.ones_like(predicted_disparity)], dim=1) | ||
XTX_inv = (X.T @ X).inverse() | ||
XTY = X.T @ rendered_disparity | ||
AB = XTX_inv @ XTY | ||
|
||
fixed_disparity = (1 / predicted_depth) * AB[0] + AB[1] | ||
fixed_depth = 1 / fixed_disparity | ||
|
||
if fuse: | ||
fused_depth = torch.where(mask, rendered_depth, fixed_depth) | ||
return fused_depth | ||
else: | ||
return fixed_depth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 Gwangbin Bae | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
The following code is largely based on the original code provided under MIT license here: | ||
https://github.com/baegwangbin/IronDepth | ||
|
||
We modify the code slightly to perform depth inpainting, according to the proposed method in [1]. | ||
|
||
[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla |
Empty file.
Oops, something went wrong.