Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camera Parameters of the dataset #2

Open
Trainingzy opened this issue Dec 12, 2024 · 7 comments
Open

Camera Parameters of the dataset #2

Trainingzy opened this issue Dec 12, 2024 · 7 comments

Comments

@Trainingzy
Copy link

Thanks for your great work and open-source the data!

Could you please share the camera intrinsics (focal length and camera center) and distortion of each rendered video? I personally feel the distortion is quite obvious.

@Trainingzy
Copy link
Author

In addition, what is the format of extrinsics? Is it opengl world-to-camera matrix?

@RuihanLu
Copy link

Same question here. Thx for authors hard work! Well Done!

@JianhongBai
Copy link
Collaborator

@Trainingzy @RuihanLu Thank you for your interest in our work.

Camera Intrinsic
All cameras share the same intrinsic matrix $$K$$, as shown below:

$$ K = \begin{bmatrix} 545.45 & 0 & 360 \\ 0 & 545.45 & 240 \\ 0 & 0 & 1 \end{bmatrix} $$

Thanks for the reminder. We will update the dataset description accordingly.

Distortion
Due to the use of spherical or cylindrical projection in the HDR panoramic image, distortion may occur when the camera is positioned at a certain distance from the center of the scene. This distortion is not caused by the camera parameters but is inherent to the scene itself.

@JianhongBai
Copy link
Collaborator

In addition, what is the format of extrinsics? Is it opengl world-to-camera matrix?

The camera extrinsic parameters are not in the standard OpenGL format. Instead, they are calculated based on the camera's position. We provide a script for visualizing the cameras.

def calculate_rotation_matrix(yaw, pitch, roll):

    yaw_rad = math.radians(yaw)
    pitch_rad = math.radians(pitch)
    roll_rad = math.radians(roll)

    cy = math.cos(yaw_rad)
    sy = math.sin(yaw_rad)
    cp = math.cos(pitch_rad)
    sp = math.sin(pitch_rad)
    cr = math.cos(roll_rad)
    sr = math.sin(roll_rad)

    rotation_matrix = [
        [cy * cp, cy * sp * sr - sy * cr, cy * sp * cr + sy * sr],
        [sy * cp, sy * sp * sr + cy * cr, sy * sp * cr - cy * sr],
        [-sp,     cp * sr,               cp * cr]
    ]
    return rotation_matrix

def create_transform_matrix(rotation_matrix, translation):

    transposed_rotation = [[rotation_matrix[j][i] for j in range(3)] for i in range(3)]
    transform_matrix = [row for row in transposed_rotation]
    transform_matrix.append([translation.x, translation.y, translation.z, 1])
    return transform_matrix

@HyeonHo99
Copy link

HyeonHo99 commented Dec 23, 2024

Hi @JianhongBai , Thank you for great work!
Regarding camera extrinsic matrix provided in your dataset repo, I quite do not understand what these numbers represent.
For example, in one scene, we have a video1 from view1 and another video from view2. And I am trying to warp video1 to a warped video in the perspective of view2, then compare this warped video with the video2 (g.t. video)

In this case, I am using the camera extrinsics to do so, assuming that the presented extrinsic matrix is "(4, 4) extrinsic transformation matrix of the view: [R, t; 0, 1]". But this does not really produce relevant warped video. Also, the translation vectors have quite big numbers.

I would really appreciate if you provide a simple code or hint how to convert the given camera extrinsic matrix (in your dataset) to standard extrinsic matrix.

Thanks a lot!

@JianhongBai
Copy link
Collaborator

Hi @JianhongBai , Thank you for great work! Regarding camera extrinsic matrix provided in your dataset repo, I quite do not understand what these numbers represent. For example, in one scene, we have a video1 from view1 and another video from view2. And I am trying to warp video1 to a warped video in the perspective of view2, then compare this warped video with the video2 (g.t. video)

In this case, I am using the camera extrinsics to do so, assuming that the presented extrinsic matrix is "(4, 4) extrinsic transformation matrix of the view: [R, t; 0, 1]". But this does not really produce relevant warped video. Also, the translation vectors have quite big numbers.

I would really appreciate if you provide a simple code or hint how to convert the given camera extrinsic matrix (in your dataset) to standard extrinsic matrix.

Thanks a lot!

Hi @HyeonHo99, Thank you for your interest. Maybe you could try to convert the extrinsic matrix with these lines https://github.com/KwaiVGI/SynCamMaster/blob/main/vis_cam.py#L111-L116.

Regarding the large values in translation vectors, you can subtract the same value from each camera in the same scene (for example, choose a reference camera and subtract its translation vector value from all other cameras). This will not alter the relative positions between the cameras.

@emjay73
Copy link

emjay73 commented Jan 17, 2025

I am also trying to do something similar to what @HyeonHo99 did.

Image
The left top is an input image, the left bottom is the expected output image, and the right image is the projection result.
I used monocular depth estimation results on the input image and scaled the depth manually, and projected it using the camera poses normalized as in vis_cam.py. (https://github.com/KwaiVGI/SynCamMaster/blob/main/vis_cam.py#L121)
Although I tried several scale values, it was not easy to align the projected image with the target image perfectly.

@JianhongBai, do you think it is because of an unknown scale between depth and camera poses?
Is there a good rule of thumb that matches the pose scale and depth scale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants