-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinate system of normal maps #40
Comments
Thank you for bringing this up! Yes, the normal maps are in camera space and not world space. I am under the impression that world space normal maps would be meaningless without camera extrinsics, so I opted to extract/provide them in camera space. I did the same as you and wrote a script to approximate the normal from the provided ground truth depth. The following GIF shows the input image next to the ground truth normal and the normal approximated from the ground truth depth. Looks good to me, but please correct me if there are any issues with it that I am not aware of. You can find the script below. The script inverts the Y axis of the normal map since the coordinate system of Unreal differs from what I am using for the approximation. Furthermore, it sets the normal map for the sky (depth greater than 50k) to a well defined value since one would otherwise get the normal of the sphere/box in which the virtual environment resides. As for approximating the normal from the depth, the script uses the cross product of vectors between neighboring points in 3D space. Note that the depth is smoothed with a Gaussian filter to reduce noise, this is a very simple approximation after all. #!/usr/bin/env python
import cv2
import json
import math
import moviepy
import moviepy.editor
import numpy
##########################################################
npyImages = []
for intSample in range(1, 20):
npyImage = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-image.png', flags=-1).astype(numpy.float32) * (1.0 / 255.0))
npyDepth = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-depth.exr', flags=-1)[:, :, None].astype(numpy.float32))
npyNormal = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-normal.exr', flags=-1).astype(numpy.float32))
npyNormal[:, :, 1:2] *= -1.0
npyNormal[:, :, 0:1][npyDepth >= 50000.0] = 0.0
npyNormal[:, :, 1:2][npyDepth >= 50000.0] = 0.0
npyNormal[:, :, 2:3][npyDepth >= 50000.0] = -1.0
npyNormal /= numpy.linalg.norm(npyNormal, 2, 2, True).repeat(3, 2)
intWidth = npyImage.shape[1]
intHeight = npyImage.shape[0]
fltFov = json.loads(open(str(intSample).zfill(5) + '-meta.json', 'r').read())['fltFov']
fltFocal = 0.5 * max(intWidth, intHeight) * math.tan(math.radians(90.0) - (0.5 * math.radians(fltFov)))
npyPinholeX = numpy.linspace((-0.5 * intWidth) + 0.5, (0.5 * intWidth) - 0.5, intWidth).reshape(1, intWidth).repeat(intHeight, 0).astype(numpy.float32)[:, :, None] * (1.0 / fltFocal)
npyPinholeY = numpy.linspace((-0.5 * intHeight) + 0.5, (0.5 * intHeight) - 0.5, intHeight).reshape(intHeight, 1).repeat(intWidth, 1).astype(numpy.float32)[:, :, None] * (1.0 / fltFocal)
npyPinholeZ = numpy.ones([intHeight, intWidth, 1], numpy.float32)
npyPoints = cv2.GaussianBlur(src=npyDepth, ksize=(3, 3), sigmaX=0.0, sigmaY=0.0)[:, :, None]
npyPoints = numpy.concatenate([npyPinholeX * npyPoints, npyPinholeY * npyPoints, npyPinholeZ * npyPoints], 2)
npyDiffX = numpy.pad(npyPoints, [(0, 0), (1, 0), (0, 0)], 'constant')
npyDiffX = npyDiffX[:, 1:, :] - npyDiffX[:, :-1, :]
npyDiffY = numpy.pad(npyPoints, [(1, 0), (0, 0), (0, 0)], 'constant')
npyDiffY = npyDiffY[1:, :, :] - npyDiffY[:-1, :, :]
npyApprox = numpy.cross(npyDiffY, npyDiffX, 2)
npyApprox /= numpy.linalg.norm(npyApprox, 2, 2, True).repeat(3, 2)
npyImages.append(cv2.resize(src=(numpy.concatenate([npyImage, (npyNormal + 1.0) * 0.5, (npyApprox + 1.0) * 0.5], 1) * 255.0).clip(0.0, 255.0).astype(numpy.uint8), dsize=None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA))
# end
moviepy.editor.ImageSequenceClip(sequence=[npyImage[:, :, ::-1] for npyImage in npyImages], fps=5).write_gif('normal.gif') Thanks again for bringing this up! Closing this issue for now, please let me know in case something is still unclear or there are any issues with my script. Thanks! |
In your synthetic dataset, could you confirm that the normal maps are in camera coordinates as opposed to world? If I compute a normal map from the depth maps using finite difference and the intrinsic camera parameters, I can't get something that looks close to the ground truth.
Also, I wonder if there is quite heavy quantisation in the normal/depth maps? I think they are stored with 16 bit depth - is that right?
The text was updated successfully, but these errors were encountered: