-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Effstabledreamfusion #492
Effstabledreamfusion #492
Conversation
For comparison, with the efficient sampling method described above, I get ~30 min for training NeRF with 128x128 resolution (subsampled to 64x64). Without efficient sampling I get ~41 min of training duration (128x128 resolution), keeping all other parameters the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jadevaibhav ,
Great job! Thank you for contributing to threestudio. Could you provide examples about how to run efficient dream fusion and some 3D rendering videos of results? Then I'm glad to merge these commits and add this feature in README.
Hi @DSaurus, thanks for your approval! I have created a separate yaml config for this, so you just have to run:
Here are the videos I generated, although they are not good quality... I am still investigating where the issue with generation quality is, and if this method can be extended to other generative systems. it10000-test.mp4it10000-test.mp4 |
Hi @jadevaibhav , Perhaps you could try to cache the rendering images without gradient first. Then, you sample some rays of this complete rendering image and update the corresponding pixels to do the SDS process. I think it is more robust for 3D generation. |
@DSaurus, could you please explain what you mean here? Here's my code of sub-sampling for clarity:
|
@jadevaibhav Sure, my idea is to use these cached images multiple times, and each time you can apply your sub-sampler to update these images. If my understanding is correct, the current mask sub-sampler will render images that are not complete. However, diffusion models like Stable Diffusion are not designed to recover these incomplete images. I think this is the reason why the current mask sub-sampler leads to unstable results. |
@DSaurus the sub-sampler is used on generated directions, so we only pass selected directions to NeRF. And while calculating SDS loss, I pass the original resolution image with rendered color filled at given indices, and 0 elsewhere. I also believe that diffusion is unable to recover the incomplete image. |
Hi @DSaurus thanks for approving the PR! I don't have the write access, so could you please merge? I looked into the "interpolation", but currently there is no way to do it with randomly sampled positions. I was looking into the grid_sample() method, but I can't define a transformation or mapping from the original resolution coordinate system to the sampled grid coordinates. I am now experimenting with uniform subsampling, with a random offset for the top-left grid corner. |
@jadevaibhav LGTM! Could you please create a file named eff_dreamfusion.py in the system folder and put your current code into this file? |
Sure! |
Done! Please review @DSaurus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jadevaibhav Thanks!
…dreamfusion Effstabledreamfusion
Thanks! I would like to contribute more, is there any new papers/ implementations we're looking at? |
@jadevaibhav I think it would be great if you are interested in implementing Wonder3D and its following papers, which could generate 3D objects in seconds. |
Efficient training of DreamFusion-like systems on higher-resolution images
I am working on a feature with Dreamfusion system(which can be extended to others). The basic idea is: to train using a higher-resolution image, we subsample pixels from it for NeRF rendering with a mask. Then we calculate the SDS loss at the original resolution image. The computational benefit is from a subsampling number of rays for NeRF training, while we train using higher resolution images (for a better visual model) in diffusion; resulting in roughly the same compute cost.
On testing using the demo prompt, using 128x128 image resolution and 64x64 subsampling for NeRF training, I get the following result.
I would like any feedback on potential issues with this idea, and how to improve results. I am looking forward to hearing from this community! @DSaurus @voletiv @bennyguo @thuliu-yt16