Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment of multi-camera pointclouds needs work #18

Open
8 tasks
jackjansen opened this issue Jul 12, 2022 · 75 comments
Open
8 tasks

Alignment of multi-camera pointclouds needs work #18

jackjansen opened this issue Jul 12, 2022 · 75 comments
Assignees

Comments

@jackjansen
Copy link
Contributor

jackjansen commented Jul 12, 2022

We need to fix the alignment once and for all.

The original issue is https://gitlab.com/VRTogether_EU/cwipc/cwipc_util/-/issues/41 but this link is dead now.

There is an old repo for experiments at https://github.com/cwi-dis/pointclouds-alignment

New experiment data is at https://github.com/cwi-dis/cwipc_test/tree/master/pointcloud-registration-test

Edit 20-Nov-2023: most of the text above is outdated. Kept for future reference.

Current plan of attack:

  • Create an algorithm that returns the approximate mis-alignment of each camera (as a distance in meters)
  • Implement a few of the alignment algorithms in a way that they can be run automatically
  • Create a registration algorithm that does something like
    • Given the misalignment from step 1 run one or more of the alignment algorithms.
    • Re-measure the misalignment. Check that it has gone down.
    • Repeat until happy, or no more improvements
    • record the results in cameraconfig.json. Also record there the current misalignment, because it is a good value for voxelization (later, during production).
  • After all this is done look at the other registration ideas @Silvia024 found, and possibly also that algorithm that @nachoreimat found.
  • After all this is done try to automate the coarse registration with Aruco codes
  • After all that is done see whether we can use multiple Aruco codes to handle a large number of cameras with fields of view that don't fully overlap.
  • If (big if) the registration procedure works with point clouds of people (as opposed to only working with point clouds of boxes, or not working at all) we should try applying it to the cwipc-sxr captures, to see if we can get better alignment.
  • In parallel to all of the steps above we should think about whether there's a paper in here somewhere, and who should be the primary person to lead this.
@jackjansen
Copy link
Contributor Author

Current best procedure (copied from Slack):

I think I've just done the best calibration we've ever done. Keeping note here so we can try again (in other words:
@shirsub
can try this procedure in vrsmall).

  1. Use 4 cameras, not 3.
  2. First do the coarse calibration cwipc_calibrate --kinect --nofine with the A4 sheet on the floor.
  3. Edit cameraconfig, set correct high/low, near/far, radius, erosion=2
  4. Do cwipc_view --kinect with nothing in view, and ensure there are no points. If there are points edit cameraconfig to fix.
  5. At the centerpoint stack 3 IKEA boxes so each camera sees 2 sides (and each side is seen by two cameras)
  6. cwipc_view --kinect and w to save a pointcloud.
  7. cwipc_calibrate --kinect --reuse --nocoarse --nograb pointcloud_xxxxxxxxx.ply --corr 0.05
  8. (The 0.05 is how far the per-camera pointclouds are apart in the grabbed image. For many years I picked lower numbers here all the time, but the trick is you should pick higher numbers.)
  9. Pick algorithm 6 (cumulative multiscale ICP), variant 2 (point2plane)
  10. Inspect result, after - a few times to make the points smaller. If you look at the stack of boxes from above you get a very good idea of the quality.
  11. The way the algorithm works at the moment means (I think) it will be happy after doing 3 steps so with the new calibration I did another grab and calibrate with --corr 0.02 which gave me a pretty good result (at least I think so).

@jackjansen
Copy link
Contributor Author

I think we should start looking at opencv to help with calibration. For example, we could use an Aruco marker (see https://docs.opencv.org/4.x/d5/dae/tutorial_aruco_detection.html) to help with detection of (0, 0, 0) and do the coarse calibration automatically.

This also gives us camera positions, so we can then determine the camera ordering for the next step (which pairs of cameras we should align).

We could then capture a 3D object (maybe the boxes, but possibly a person is good enough). I have the feeling that we could use the histogram of point-to-point distances to determine how well the coarse calibration worked: I'm expecting a plateau on the left of the histogram, then a drop, then a very long tail. Somewhere along the drop would seem to be a good value for the --corr parameter.

We then use cumulative multi scale ICP or something, and after that recompute the histogram. corr should now be lower.

Then I think we repeat with the other camera pairs.

Question is what to do after all pairs have been done, because possibly the last calibration has broken the first one.

Another option could be to first compute the corr for each pair, and use that to determine the order of doing the aligning, but I'm not sure whether "best first" or "worst first" is the best option. I think "best first" but I'm not sure.

But I do think that after we've finished calibration we should repeat the calculation of corr, because this will give us the optimal cell size for voxelization.

@fonskuijk I would really like your opinion on this idea...

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 12, 2023

Edit 20-Nov-2023: most of the information in this comment is now outdated. See below. Keeping it here for future historic reference.

Okay, let's use this issue to keep note of what we've done (and found out) so far.

  • There are 6 pointclouds in cwipc_test that have been grabbed in vrbig and vrsmall. One of the stack of boxes, one of Jack facing the front camera and one of Jack facing between the front camera and the next camera.
  • From visual inspection the registration in vrbig looks pretty good, the registration in vrsmall has one camera off by a few centimeters.
  • There's a script sandbox/voxelize_curve.py (now badly named) that vowelises a point cloud by ever decreasing voxelsize. It creates a CSV file with an entry per voxelsize, showing how many points there are, how many points with 1 contributing camera, how many points with 2 contributing cameras, etc.
  • Each voxelized point cloud is also saved.
  • The script has been run on the two boxes pointclouds, after colorising them per camera, i.e. after running --filter 'colorize(1,"camera")' on it.
  • The resulting CSV files can be imported into voxelize_curve_spreadsheet.numbers to graph them.
  • (note that various fixes were made to cwipc_view and cwipc_downsample() to make this all work somewhat)

The results are difficult to interpret. I was expecting to see some sort of a threshold effect between cellsize > epsilon and cellsize < epsilon, where epsilon is how good the current registration is (in other words: when the cellsize is so large that points from camera 1 and camera 2 would be merged I would expect a super-linear drop in the point count.

This doesn't happen. except for very small and very large cell sizes I'm seeing a more-or-less linear decrease in the number of points (of approximate a factor of 2, as is to be expected as I'm using sort(0.5) as the factor between cell sizes.

point-count-factors-per-cellsize

Next I decided to look at the number of contributing cameras for each result point (after voxelizing) at each cellsize. Again, I was expecting to see something interesting at the boundary of cellsize == epsilon: something like a sudden increase in the number of points with two contributing cameras (at the expense of the number of points with one contributing camera).

But again no such luck: things look vaguely linear. Moreover, already at small cellsizes there are quite a few points that already have three contributing cameras. This should not be possible with the boxes, I think: I think the alignment of the cameras and the boxes was such that each side could really only be seen by two cameras.

n-camera-contributions-cumulative

Either I am making a stupid mistake in my thinking, or there is another bug in the way cwipc_downsample() works. I will investigate the latter first.

Edit: there was indeed another stupid bug in cwipc_downsample() which led to sometimes too many contributing cameras being specified in a point.

Here is the correct version of the previous graph:
Screenshot 2023-11-15 at 17 01 50

@jackjansen
Copy link
Contributor Author

Came up with a wild plan (thanks @Silvia024 !).

What if we take synthetic point cloud, and then create 4 point clouds from that (North, East, South, West), which each of the new point clouds having half the points of the original (basically the ones that can be seen from that direction).

Now we can slightly perturb each of the NESW point clouds, and recombine them into a single "4-camera" point cloud.

This we can now present to voxelize_curve.py and see what it reports. And we know the ground truth, because we have done the perturbations.

So this should allow us to see what the magic numbers are that we are looking for.

Moreover, this scheme (or variations of it, for example introducing some noise) could also be used to do automatic testing of our point loud registration algorithms (once we have written them:-).

@jackjansen
Copy link
Contributor Author

That was a bit of a disappointment, so far. Or actually maybe it just showed our erronous thinking:-)

Implemented the generation of registration-test pointclouds, with a settable offset for camera 2 and 4. (1 and 3 will be correct).

Created pointclouds with an error of 0.03 and 0.003.

The results are in https://github.com/cwi-dis/cwipc_test/tree/master/pointcloud-registration-test
but the quick message is that the curves still look pretty linear. There is a clear "minimum voxelsize" where points from two cameras where points from multiple cameras start to be combined, but this "minimum voxelsize" is quite a bit smaller than the pre-determined error with which the point clouds were generated.

I could say something (based on the data for these pointclouds) like: "As soon as more than 15% of the points come from two cameras our voxelsize is larger than the error" but this would just be data-fitting.

@Silvia024 can you have a look at the data?

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 14, 2023

Another wild idea. Comments please.

For each voxelsize we voxelize the whole pointcloud, just as we do now.
But we also take the 4 per-camera pointclouds and voxelize those individually. We add the pointcounts of those cloud.

We now compare the two numbers. This comparison is a measure for how much camera-combining has happened.

Edit 20-Nov-2023: We decided not to go down this path, because we dropped the idea of using voxelization to compute alignment quality.

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 14, 2023

And yet another wild idea. We are now doing this on the whole pointcloud, maybe that is a bad idea because it confuses our numbers.

What if we construct the 6 pairwise point clouds (cam1+2, cam1+3, etc) and run this analysis separately on each pairwise pointcloud?

Edit: this did not work. I experimented with it by creating the generated point clouds (0.03 and 0.003) with only two cameras (dropping the other two). The curves are just as meaningless as for the full 4-camera cloud.

I am pretty sure that the same is true for the previous wild idea.

@Silvia024
Copy link

Silvia024 commented Nov 14, 2023

maybe tomorrow we can chat better about this?

btw I just found this: IntelRealSense/librealsense#10795
and it seems quite relevant and it might be of interest. I will try to read it tonight

@Silvia024
Copy link

I found two interesting papers...

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 15, 2023

Updated two previous comments (the two wild ideas): they turned out not to work. I have also experimented with changing the "next cellsize factor" from sqrt(0.5) to sqrt(sqrt(0.5)) but there is no appreciable bump around the known epsilon with which the point clouds are created.

I now think that the whole idea of using voxelization at successively smaller sizes and looking for a discontinuity simply doesn't work.

To come up with a theory: voxelization is essentially a stochastic procedure, because it depends on the origin of the grid and the individual original points have no relationship to that grid. So whether or not two points are combined if they are less that cellsize apart is a chance: if cellsize > 2*distance they are definitely combined, if cellsize < distance/sqrt(3) they are definitely not, but the area in between is grey. @Silvia024 does this sound reasonable?

@Silvia024 @troeggla please up-vote this comment if you agree that voxelization is a dead end. Then I can remove all the generated data and spreadsheets over in cwipc_test/pointcloud-registration-test (which is quite a lot).

@jackjansen
Copy link
Contributor Author

My suggestion for the next try is to compute foreach point in cam1cloud the minimum distance to any point in cam2cloud. First thing to look at is the histogram of these distances. (April 9 comment). I hope we would see a very large peak just below epsilon with a very long tail (all the cam1 points that have no corresponding cam2 point).

I think the KD-Tree is the data structure we want. And it appears to be supported in NumPy, example code is in https://medium.com/@OttoYu/exploring-kd-tree-in-point-cloud-c9c767095923

@jackjansen
Copy link
Contributor Author

The idea of using KDTree and plotting a histogram of the mimic distances looks very promising.

There's a Jupyter notebook in cwipc_test, I'm going to convert it to a script.

@Silvia024
Copy link

Updated two previous comments (the two wild ideas): they turned out not to work. I have also experimented with changing the "next cellsize factor" from sqrt(0.5) to sqrt(sqrt(0.5)) but there is no appreciable bump around the known epsilon with which the point clouds are created.

I now think that the whole idea of using voxelization at successively smaller sizes and looking for a discontinuity simply doesn't work.

To come up with a theory: voxelization is essentially a stochastic procedure, because it depends on the origin of the grid and the individual original points have no relationship to that grid. So whether or not two points are combined if they are less that cellsize apart is a chance: if cellsize > 2*distance they are definitely combined, if cellsize < distance/sqrt(3) they are definitely not, but the area in between is grey. @Silvia024 does this sound reasonable?

@Silvia024 @troeggla please up-vote this comment if you agree that voxelization is a dead end. Then I can remove all the generated data and spreadsheets over in cwipc_test/pointcloud-registration-test (which is quite a lot).

the theory seems correct to me :)
agree that maybe it's not a good strategy

@jackjansen
Copy link
Contributor Author

I'm getting beautiful graphs, plotting (per camera pair) the cumulative distance-to-other-cloud.

cumdist

@jackjansen
Copy link
Contributor Author

All the graphs are now in cwipc_test. They look mostly usable. I'm going to change the Y axis from being counts to being fractions, that should make them even easier to interpret (I hope).

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 19, 2023

For reference in further discussion below: here is the current state of the graph for the boxes in vrsmall (with one camera about 1cm off):

boxes_cumdist

And here is one of the Jack point clouds (the best one, the sideways one):

jack-sideways_cumdist

And here is the one for the generated point cloud, with a 0.03 m offset of two of the cameras:

genregtest03_cumdist

@jackjansen
Copy link
Contributor Author

First thing I would like to do is convert each of these lines into a number correspondence. This number should be an upper bound for the registration error between the two cameras in that pair, for the given capture. If we are pretty sure the cameras have no significant overlap in field of view (pairs (0, 3) and (1, 2) in the graphs above) we return an invalid value (NaN or None or something like that).

Once we have this number we can do two things with it:

  • Use it as the --corr parameter to the various registration algorithms we have
  • After applying an algorithm we can recompute this correspondence number and ascertain it is lower than the previous value.

@jackjansen
Copy link
Contributor Author

Idea: we should look at the derivative of these lines (which is actually the distance histogram data) and the second derivative. Possibly after smoothing the lines a little.

The second derivative should start positive, then go negative, then go positive again. Maybe that last point is the number we're interested in.

@jackjansen
Copy link
Contributor Author

jackjansen commented Nov 20, 2023

Unrelated idea (@Silvia024 comments please): we are currently looking at the symmetric combined difference between camera pairs.

Would it be better to do a per-camera comparison to all other cameras combined? I.e. not compute and graph the distance per camera pair, but in stead look at the distance of the points of camera N to the points of all other cameras together?

That should extend the "tail" of the histogram, probably making it more linear-ish. Also, the resulting number correspondence should be more useful, because it's a number that pertains to a single camera (in stead of to a pair of cameras).

@jackjansen
Copy link
Contributor Author

Would it be better to do a per-camera comparison to all other cameras combined?

Implemented, and it seems to produce much better results. So much better that We need to check that we are not looking at bullshit data.

Here are the graphs (for single-camera-to-all-others) that are for the same dataset as the pair graphs of yesterday:

boxes_cumdist_one2all

jack-sideways_cumdist_one2all

genregtest03_cumdist_one2all

@jackjansen
Copy link
Contributor Author

And here are the histogram graphs of the same datasets:

boxes_histogram_one2all

jack-sideways_histogram_one2all

genregtest03_histogram_one2all

@Silvia024
Copy link

The original issue is https://gitlab.com/VRTogether_EU/cwipc/cwipc_util/-/issues/41

@jackjansen, I can't access to this link. It says page not found. Maybe do I not have the right to access it?

@jackjansen
Copy link
Contributor Author

The original issue is https://gitlab.com/VRTogether_EU/cwipc/cwipc_util/-/issues/41

@jackjansen, I can't access to this link. It says page not found. Maybe do I not have the right to access it?

That link is dead (the gitlab site disappeared when we stopped paying for it).

@jackjansen
Copy link
Contributor Author

Getting started with a guess at the correspondence (when the histogram is "over the hill", and has descended about half way down). Looks good for the generated point clouds, and usable for the captured ones (in the right ballpark).

boxes_histogram_one2all

jack-sideways_histogram_one2all

genregtest03_histogram_one2all

@jackjansen
Copy link
Contributor Author

Paused working on the aruco markers for now.

Implemented a manual coarse calibration algorithm (basically the same as what we have: select the four coloured corners of the calibration target).

This works, the resulting pointcloud and the correspondence graph are in pointcloud-registration-test/vrsmall-noreg/capture.

Here is the graph:

pointcloud-0001

The correspondence errors are too optimistic, but visual inspection of the graph shows that something like 0.05 is probably the correct number.

But: these pointclouds are so big (500Kpoint per camera) that the analysis is very expensive: it takes about a minute on my M1 Mac, and I didn't wait for it on the Intel Mac. That's a shame, because for our common use case (capturing in a room where the walls are close enough to be captured by the cameras) this would actually be a very useful test capture...

@jackjansen
Copy link
Contributor Author

In case we decide to go for detecting the Aruco markers in the RGB image and then finding the corresponding points in the point cloud: here are some interesting links about how the conversion of image pixel coordinates to 3D points can be done using librealsense API's:

IntelRealSense/librealsense#8221
https://medium.com/@yasuhirachiba/converting-2d-image-coordinates-to-3d-coordinates-using-ros-intel-realsense-d435-kinect-88621e8e733a
IntelRealSense/librealsense#11031

@jackjansen
Copy link
Contributor Author

Back to fine registration.

There is a thinking error in the idea that only looking at the correspondence between each (camera-N, all-cameras-except-N) is good enough.

This is shown clearly in a new dataset offline-boxes. See there for the full details, but here is a screenshot of the calibration it managed to come up with:

Screenshot 2024-01-01 at 23 55 34

Visually, it is clear what the problem 1 is: the green and blue camera together (or the red and brown camera together) need to be moved to make things better. But looking at each camera in isolation is never going to work: if we would move the green camera to better align with the red one we would completely lose the alignment with the blue camera.

There is also a problem 2 with the analysis. Here is what our code thinks of the alignment it came up with:

captured-boxes-3_histogram_after_step_6

Those numbers (14 mm to 18 mm) are far too optimistic.

For problem 1 a potential solution is to not only look at correspondences between (camera-N, all-cameras-except-N) but also (camera-N-and-M, all-cameras-except-N-and-M) for every combination (N, M) where those two cameras are "closest to each other". If and N-and-M camera combination would end up as candidate-to-fix we would try fixing that "camera" and apply the transformation to both cameras.

But this feels like a hacker-jack solution: it seems like it might fix this problem but I'm not really sure it's really a solution and not a quick hack.

I have the feeling that tackling problem 2 first may be better.

Ideas, anyone?

@jackjansen
Copy link
Contributor Author

Some progress with coarse calibration.

I'm manually positioning the pointcloud in the open3d viewer so that the virtual camera is approximately where the physical camera was. I then grab the RGB and D images.

I can now detect the aruco markers in the RGB image.

But this is where the good news stops: the next step is converting the u, v coordinates that the aruco detector returns (coordinates in the RGB image) back to x, y, z coordinates in the point cloud.

I think I'm doing all the right things: I'm taking the depth from D[v, u] and using the camera intrinsics to de-project, and then I'm transforming with the extrinsic matrix.

But I always end up with weird origins. I've tried transposing the extrinsic matrix, I've tried mirroring x, or mirroring y and z (which was suggested somewhere) but nothing seems to work.

@jackjansen
Copy link
Contributor Author

See isl-org/Open3D#6508 for a decent description of the open3d coordinate system (RH y-UP) and its idiosyncrasies with respect to lookat and forward.

@jackjansen
Copy link
Contributor Author

The problem may be that I "invert" the affine transform (the extrinsic matrix) by transposing it. That would work for a normal 3x3 matrix in our case (because we know all the matrices are size and shape preserving) but of course it doesn't work for an affine transform, I have to use another vector as the fourth column (and clear the fourth row).

Here is an explanation: http://negativeprobability.blogspot.com/2011/11/affine-transformations-and-their.html

@jackjansen
Copy link
Contributor Author

And I found https://stackoverflow.com/questions/2624422/efficient-4x4-matrix-inverse-affine-transform which says a similar thing. Also, looking at the source code for open3d PointCloud.create_from_depth_image it seems to be that they're indeed doing this, but they have the eigen3d affine transform which has an invert. I guess I'll have to create that by hand.

@jackjansen
Copy link
Contributor Author

Automatic coarse calibration based on Aruco markers is working. It's actually working so well that I have merged back into master, at a1b2855, so this can now be considered production-ready.

@jackjansen
Copy link
Contributor Author

Multi-marker coarse alignment, for camera positions where not all cameras can see the (0, 0, 0) origin aruco marker so we use auxiliary markers, is also working good enough that I've merged it into master, at b923190

@jackjansen
Copy link
Contributor Author

So, back to fine calibration or actually first to the analysis of the current calibration.

I'm working with the offline-boxes capture, because that shows the problem most clearly. I've created plots for all analysers that we have (one2all, one2all-filtered, one2all-reverse, one2all-reverse-filtered and pairwise). All the graphs are in cwipc_test.

The most informative graph for this dataset (but note the italics) is the pairwise graph:

captured-boxes-3_histogram_paired

The "camera numbers" here are the or of the two contributing cameras. As a human we can easily see that camera 1 is opposite camera 4 and camera 2 is opposite camera 8. And I also know this is correct, because I know that the cameras are placed in the order 1-2-4-8 clockwise. We can probably detect this algorithmically if we want.

As a human we can estimate the correspondences of the pairs:

  • 1 to 2 (red): about 2cm
  • 2 to 4 (olive): about 4mm
  • 4 to 8 (turquoise): at least 4cm
  • 8 to 1 (dark blue): less than 4mm

We can also see that the correspondence errors that the current "algorithm" (but really "quick hack" is a better term) has come up with are wildly wrong. Not surprising: the current "algorithm" works by finding the peak in the histogram and then move right until we get below a value that is less than 0.5*peak.

I will experiment with mean and stddev to see if I can get some more decent numbers. Then, if they work for this dataset, they should also be tried for the captured Jack dataset.

@jackjansen
Copy link
Contributor Author

jackjansen commented Jan 14, 2024

Mean and stddev by themselves are not going to work. Here are the results for the graph above:

camera 3: mean=0.02291982490598397, std=0.024645493357004753, peak=0.002320735082210554, corr=0.007816463821634046
camera 5: mean=0.04401491186517467, std=0.028213745113280272, peak=0.012704797435346417, corr=0.058097526853423245
camera 9: mean=0.016755202242697633, std=0.026204388020447566, peak=0.0018051266662951993, corr=0.002804512231245586
camera 6: mean=0.015378887548555181, std=0.023343767740899458, peak=0.001824420385722404, corr=0.003573071088354493
camera 10: mean=0.048489777693837444, std=0.028316377093057312, peak=0.0021639993720285154, corr=0.08227788490063927
camera 12: mean=0.0341007288961789, std=0.023926849570313418, peak=0.01910221049639875, corr=0.04007440525818792

The mean for the "good pairs" (6 and 9) is far too high.

And that is pretty logical, when you think about it: the long tails have an inordinate effect on the mean.

Next thing to try: first compute mean and stddev. then throw away all distances that are larger than (wild guess) mean+stddev, or maybe 2*mean. Then compute mean and stddev on the points that remain.

Edit: another thing to try is to keep only the points in the range [mean-stddev, mean+stddev].

The idea is that for the "bad pairs" this will throw away less of the points, but for the "good pairs" it will throw away more points.

@jackjansen
Copy link
Contributor Author

Tried that. Also tried running the filtering multiple times, to see how the mean and stddev behave. Used the bracketing filter [mean-stddev, mean+stddev] on two premises:

  • It sort-of feels more mathematically correct,
  • it just so happens that for the "good pairs" std > mean so we don't throw away any "good points", while for the "bad pairs" std < mean so we throw away points on both sides, so running the filter successively should not change mean too much (while for the "good pairs" it will lower mean).

Here are the results:

camera 3: peak=0.002320735082210554, corr=0.007816463821634046
camera 3: 0 filters: mean=0.02291982490598397, std=0.024645493357004753, nPoint=81262
camera 3: 1 filters: mean=0.012951448749072387, std=0.010969866908687875, nPoint=66992
camera 3: 2 filters: mean=0.009752436971104068, std=0.005658494330190411, nPoint=52946
camera 3: 3 filters: mean=0.009181153840633022, std=0.0032857770409335132, nPoint=32361
camera 5: peak=0.012704797435346417, corr=0.058097526853423245
camera 5: 0 filters: mean=0.04401491186517467, std=0.028213745113280272, nPoint=44935
camera 5: 1 filters: mean=0.04137415890726912, std=0.016101593626257675, nPoint=26525
camera 5: 2 filters: mean=0.04045734373520693, std=0.009445048042243229, nPoint=15533
camera 5: 3 filters: mean=0.040051235318581284, std=0.005529610745340443, nPoint=8873
camera 9: peak=0.0018051266662951993, corr=0.002804512231245586
camera 9: 0 filters: mean=0.016755202242697633, std=0.026204388020447566, nPoint=81399
camera 9: 1 filters: mean=0.006063694689290081, std=0.008691325161748644, nPoint=67977
camera 9: 2 filters: mean=0.003177805834766034, std=0.0024780598363417766, nPoint=59915
camera 9: 3 filters: mean=0.002515016672254931, std=0.0011297240188485047, nPoint=52181
camera 6: peak=0.001824420385722404, corr=0.003573071088354493
camera 6: 0 filters: mean=0.015378887548555181, std=0.023343767740899458, nPoint=93198
camera 6: 1 filters: mean=0.006850735824829795, std=0.00853339253154322, nPoint=79885
camera 6: 2 filters: mean=0.003812802111714147, std=0.0027496213518108494, nPoint=69301
camera 6: 3 filters: mean=0.003089897111975529, std=0.00140128710337592, nPoint=56900
camera 10: peak=0.0021639993720285154, corr=0.08227788490063927
camera 10: 0 filters: mean=0.048489777693837444, std=0.028316377093057312, nPoint=43545
camera 10: 1 filters: mean=0.04792050129221857, std=0.016245794338419692, nPoint=25355
camera 10: 2 filters: mean=0.04781840036709909, std=0.00938153612816915, nPoint=14709
camera 10: 3 filters: mean=0.04774644484942719, std=0.005421145457380683, nPoint=8482
camera 12: peak=0.01910221049639875, corr=0.04007440525818792
camera 12: 0 filters: mean=0.0341007288961789, std=0.023926849570313418, nPoint=73397
camera 12: 1 filters: mean=0.028625500662925147, std=0.011632248990853532, nPoint=52026
camera 12: 2 filters: mean=0.027763304899151773, std=0.0064978872262983645, nPoint=33259
camera 12: 3 filters: mean=0.027708708994364492, std=0.003789831220465292, nPoint=19513

This seems to be going in the right direction: the "bad pairs" (opposing cameras) have their mean staying put at high values. The "good pairs" have their mean going down significantly, towards what appears to be a correct value. The "not so good pairs" (3 and 12) also seem to end up at decent values.

@jackjansen
Copy link
Contributor Author

Partial success. That is to say: this works pretty well for camera-pair measurements on the boxes:

captured-boxes-3_histogram_paired

These are pretty beleivabe numbers!

Unfortunately it does not work well at all for the one-to-all-others measurements:

captured-boxes-3_histogram_one2all

I think the problem is that this algorithm throws away any points that it can't match (which, in case of this dataset, includes the mismatched "edges that are sticking out".

Let's first check how the pair-wise measurements work on the other datasets.

@jackjansen
Copy link
Contributor Author

jackjansen commented Jan 15, 2024

That didn't work very well. I've now made the pair-wise measurement symmetric but this needs work: at the moment it is far too expensive.

And it is also too aggressive in trying to put as many points into the overlapping set as it can. Can be seen with the loot datasets.

We should somehow re-enable the max_distance topping of the kdtree distance finder (I disabled it for now) but still count the points that go over it.

@jackjansen
Copy link
Contributor Author

For future reference: when we get back to finding the "best" algorithm to align the pointclouds we should look at point-to-plane ICP with a robust kernel. From https://www.open3d.org/docs/latest/tutorial/pipelines/robust_kernels.html#Vanilla-ICP-vs-Robust-ICP I get the impression that the robust kernel is a way to deal with noise. The referenced page uses generated noise, but of course our sensors are also noisy...

@jackjansen
Copy link
Contributor Author

Copied from Slack:

Folks, in your research of registration algorithms, have you come across any that allow "pinning" of one of the variables? I.e. ask the algorithm to find the optimal transformation but specifying, for example, that the y-translation must be zero?
Because if that exists then we could do fine calibration in two steps:

  • First do a fine calibration of the empty capture. This will align all the floors. Then assure that the floors also fall in the plane y=0.
  • Next do the fine calibration with boxes or people or whatever, but pin y=0.

Actually, thinking a bit more, we not only want to pin the y-translation to 0 but also the x-rotation and z-rotation. So the only free variables should be y-rotation, x-translation and z-translation.

@ireneviola
Copy link

We might want to change the loss function as to only consider a 2D error - which would effectively mean that in every iteration, the algorithm would be forced to change only the parameters that are considered in the loss function, because the others would have no impact in the error. There might be other ways of writingit as an optimization problem, though; we should check it out

@jackjansen
Copy link
Contributor Author

jackjansen commented Sep 20, 2024

The fixer managed to make them all upright, but the's where the good news stops. They're still off by quite a bit, and moreover (and worse): the analysis algorithm produces way too optimistic values.

Inspecting this issue again after half a year of inactivity, but a lot of actually using the current registration setup in production. The comment quoted above (from 11-Dec-2023) seems to be the main thing that is bothering us most at the moment: Often, when running cwipc_register --fine, the script will report that it has managed to align all point clouds to within a few millimeters. But actual inspection of the captured point cloud clearly shows that some areas (and often important areas like the head) are off by 5-10cm.

The "solution" we are currently using is to simply try again with the subject human in a different pose, and hoping for the best.

Fixing this, or at least showing the operator something (for example a graph of the p2p distance distribution) from which they can tell this has happened is at the moment of paramount importance.

@jackjansen
Copy link
Contributor Author

Once we have addressed the issue above (the bad numbers coming out of our analysis) my feeling is that we should move to a "mixed" upper strategy. Right now our upper strategy is either pairwise or one-to-all-others, but maybe we should first do one round of pairwise, and after that a round of one-to-all-others.

If we do the pairwise round in the right order (i.e. most overlapping pair first) I think that should get us out of the "local minimum problem" with the boxes.

The right order should be easy to compute: for each pair, compute the upper bound of the percentage/fraction of points that could possibly overlap. High-to-low is the right order.

@jackjansen
Copy link
Contributor Author

Finally getting back to this. Started by forwarding the 18-alignment branch to the current master.
Will start by addressing the first 21-Sep-2024 point: finding out why our analysis is so optimistic.

@jackjansen
Copy link
Contributor Author

jackjansen commented Feb 3, 2025

We're going to need some debug tools to see what is happening.

One is #136 so we can visually inspect the different tiles at the same time.

Another is that when we are paused and we use a command that should change our view (such as the colorise option from #136, or selecting a different tileset to see with the number commands) the point cloud is redrawn with the new options.

We may also want an option to cwipc_register that makes it create a log directory with everything that it has been doing at each step, including all ply files, etc.

@jackjansen
Copy link
Contributor Author

Starting to experiment with capture-2024-1127-1429 to see how good we can get.

First thing noticed is that the radius-filter is messing things up: the back of Thong isn't captured. Turned off the radius filter.

Then I noticed that the "moving Aruco Marker" problem is back. Need to get rid of that too.

@jackjansen
Copy link
Contributor Author

jackjansen commented Feb 24, 2025

Fixed those issues by modifying cameraconfig. The new one is attached here:

cameraconfig.json

With this cameraconfig we took a capture just after Thong clapped his hands (slightly after ts=1732710676049.

And here is the histogram plot of the final resulting distances between each camera and all others:

Image

We can see a number of (completely different) things from this plot:

  • The histogram plot is not the right one to show, we want cumulative because it would make it much easier (we hope) to interpret the results.
  • The correspondence error computed for camera 4 is far too optimistic.

We need to fix the first bullet. Then we need to decide what we do first:

  • Fix the second bullet, or
  • Have a "human inspection" outer algorithm, where the user decides which camera to try fixing next.

For reference, here is the log of the steps the algorithm took. As can be seen camera 4, which appears to be the troublemaker from human inspection of the graphs, was never re-aligned, because it always appeared to have the "best" result for its alignment:

grab: captured 972926 points, ts=1732710676183
grab: stopping
grab: stopped
cwipc_register: Saved pointcloud and cameraconfig for step3_capture_fine
cwipc_register: Use fine alignment class MultiCamera
camera 1: 0 filters: mean=0.08726267932263795, std=0.16682569169882122, nPoint=255935
camera 1: 1 filters: mean=0.03994678700741507, std=0.046734246223548034, nPoint=230698
camera 1: 2 filters: mean=0.026024080235830076, std=0.023486786270562738, nPoint=202884
camera 1: 3 filters: mean=0.01830999702134181, std=0.013575160276728114, nPoint=145884
camera 1: corr=0.03188515729806993, matched=134256, total=255935, fraction=0.524570691777209
camera 2: 0 filters: mean=0.12507721274850828, std=0.20507471253062573, nPoint=258325
camera 2: 1 filters: mean=0.03885565536668416, std=0.06712790973243157, nPoint=212320
camera 2: 2 filters: mean=0.020087208061662358, std=0.022716392718550568, nPoint=193486
camera 2: 3 filters: mean=0.011577415504397934, std=0.010806521428930899, nPoint=162272
camera 2: corr=0.02238393693332883, matched=133829, total=258325, fraction=0.5180644536920546
camera 4: 0 filters: mean=0.13837594942240514, std=0.29550212864077785, nPoint=267089
camera 4: 1 filters: mean=0.03304925221287418, std=0.07666745932378718, nPoint=233604
camera 4: 2 filters: mean=0.012464862683069351, std=0.01705470907356439, nPoint=214986
camera 4: 3 filters: mean=0.007079425048334478, std=0.006499748874885224, nPoint=188443
camera 4: corr=0.013579173923219702, matched=160356, total=267089, fraction=0.6003841416157161
camera 8: 0 filters: mean=0.19004005007503835, std=0.3443411966968814, nPoint=219267
camera 8: 1 filters: mean=0.07425747250256175, std=0.1295977439169904, nPoint=191922
camera 8: 2 filters: mean=0.0242699051546891, std=0.04346627822352093, nPoint=162575
camera 8: 3 filters: mean=0.010153926454012606, std=0.01264793364287867, nPoint=144094
camera 8: corr=0.022801860096891276, matched=126830, total=219267, fraction=0.5784272143094948
registration.MultiCamera: Before: overall correspondence error 0.024489670863628233. Per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.03188515729806993, weight=0.37648411290308587
	camnum=8, correspondence=0.022801860096891276, weight=0.2679356030621006
	camnum=2, correspondence=0.02238393693332883, weight=0.2642271128895933
	camnum=4, correspondence=0.013579173923219702, weight=0.1627484583790241
registration.MultiCamera: Step 1: camera 1, correspondence error 0.03188515729806993, overall correspondence error 0.024489670863628233
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.278176e-01, inlier_rmse=1.404606e-02, and correspondence_set size of 135087
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.278176e-01, inlier_rmse=1.404606e-02, and correspondence_set size of 135087
Access transformation to get result.
camera 1: 0 filters: mean=0.08515003121721029, std=0.16219971557567636, nPoint=255935
camera 1: 1 filters: mean=0.03944200513305314, std=0.04602416760871725, nPoint=230970
camera 1: 2 filters: mean=0.02564313404211889, std=0.023322835623908814, nPoint=202680
camera 1: 3 filters: mean=0.017700764204254607, std=0.01343614521583445, nPoint=147451
camera 1: corr=0.031136909420089058, matched=133711, total=255935, fraction=0.5224412448473246
camera 2: 0 filters: mean=0.1261837839728534, std=0.2063795408556695, nPoint=258325
camera 2: 1 filters: mean=0.039161718584824354, std=0.06737647626853334, nPoint=212203
camera 2: 2 filters: mean=0.020558623456058, std=0.023314484283719752, nPoint=193782
camera 2: 3 filters: mean=0.011712788038094267, std=0.011202703058570438, nPoint=161905
camera 2: corr=0.022915491096664707, matched=133953, total=258325, fraction=0.518544469176425
camera 4: 0 filters: mean=0.13674437823952734, std=0.29260257659315536, nPoint=267089
camera 4: 1 filters: mean=0.03293592517235412, std=0.07524826004957648, nPoint=233886
camera 4: 2 filters: mean=0.012751611144142284, std=0.017278143956666384, nPoint=215151
camera 4: 3 filters: mean=0.007182366444128553, std=0.006667451445125218, nPoint=187727
camera 4: corr=0.013849817889253772, matched=159701, total=267089, fraction=0.597931775550472
camera 8: 0 filters: mean=0.190009468584249, std=0.3443543199367892, nPoint=219267
camera 8: 1 filters: mean=0.07421773854638268, std=0.12959852864706503, nPoint=191920
camera 8: 2 filters: mean=0.024223135473280803, std=0.04343843611368515, nPoint=162570
camera 8: 3 filters: mean=0.010127073527619965, std=0.012597436101665258, nPoint=144125
camera 8: corr=0.022724509629285225, matched=126894, total=219267, fraction=0.5787190958967834
registration.MultiCamera: Step 1: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.031136909420089058, weight=0.3675225186194392
	camnum=2, correspondence=0.022915491096664707, weight=0.2705229699892271
	camnum=8, correspondence=0.022724509629285225, weight=0.26703815261298075
	camnum=4, correspondence=0.013849817889253772, weight=0.16593547967402955
registration.MultiCamera: Step 2: camera 2, correspondence error 0.022915491096664707, overall correspondence error 0.02428450512243735
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.189277e-01, inlier_rmse=9.174326e-03, and correspondence_set size of 134052
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.189277e-01, inlier_rmse=9.174326e-03, and correspondence_set size of 134052
Access transformation to get result.
camera 1: 0 filters: mean=0.08483494273011204, std=0.1620483013680917, nPoint=255935
camera 1: 1 filters: mean=0.03926487600054994, std=0.045942779543020625, nPoint=231066
camera 1: 2 filters: mean=0.025492898181408596, std=0.023267805001552604, nPoint=202776
camera 1: 3 filters: mean=0.01748906771124196, std=0.013412983975476368, nPoint=148107
camera 1: corr=0.030902051686718328, matched=133588, total=255935, fraction=0.5219606540723231
camera 2: 0 filters: mean=0.1262190678707062, std=0.20652847717333697, nPoint=258325
camera 2: 1 filters: mean=0.0391711263301533, std=0.06759738467608756, nPoint=212212
camera 2: 2 filters: mean=0.02054506243217559, std=0.023462209014660125, nPoint=193855
camera 2: 3 filters: mean=0.011630897174464076, std=0.011242480463347224, nPoint=161926
camera 2: corr=0.0228733776378113, matched=133982, total=258325, fraction=0.5186567308622859
camera 4: 0 filters: mean=0.13675038651780155, std=0.29260050589973113, nPoint=267089
camera 4: 1 filters: mean=0.032942291281810095, std=0.0752428663397196, nPoint=233886
camera 4: 2 filters: mean=0.012765479086436991, std=0.01728549702870202, nPoint=215163
camera 4: 3 filters: mean=0.007193978172306782, std=0.006677447133917714, nPoint=187743
camera 4: corr=0.013871425306224497, matched=159725, total=267089, fraction=0.5980216332383588
camera 8: 0 filters: mean=0.1906543841919635, std=0.34475906981817683, nPoint=219267
camera 8: 1 filters: mean=0.07469297567181772, std=0.13012275745184318, nPoint=191883
camera 8: 2 filters: mean=0.024339505929850794, std=0.04363447385844152, nPoint=162386
camera 8: 3 filters: mean=0.010226977582093165, std=0.012665447779570158, nPoint=144069
camera 8: corr=0.022892425361663325, matched=126911, total=219267, fraction=0.5787966269434069
registration.MultiCamera: Step 2: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.030902051686718328, weight=0.36472195067959823
	camnum=2, correspondence=0.0228733776378113, weight=0.2700307617298742
	camnum=8, correspondence=0.022892425361663325, weight=0.2690144151082393
	camnum=4, correspondence=0.013871425306224497, weight=0.16619644385564938
registration.MultiCamera: Step 3: camera 8, correspondence error 0.022892425361663325, overall correspondence error 0.024216661975675832
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.807714e-01, inlier_rmse=7.996496e-03, and correspondence_set size of 127344
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.807714e-01, inlier_rmse=7.996496e-03, and correspondence_set size of 127344
Access transformation to get result.
camera 1: 0 filters: mean=0.08480573883775841, std=0.1620665131602924, nPoint=255935
camera 1: 1 filters: mean=0.03922983300526544, std=0.04597604179420256, nPoint=231063
camera 1: 2 filters: mean=0.025450178894767753, std=0.023326021048003128, nPoint=202759
camera 1: 3 filters: mean=0.017387675001898764, std=0.013478223722260268, nPoint=148372
camera 1: corr=0.030865898724159034, matched=133465, total=255935, fraction=0.5214800632973215
camera 2: 0 filters: mean=0.12559620279149736, std=0.20564728429253104, nPoint=258325
camera 2: 1 filters: mean=0.0390352048880553, std=0.0672175007515984, nPoint=212322
camera 2: 2 filters: mean=0.02047315371380545, std=0.023432864310322352, nPoint=193868
camera 2: 3 filters: mean=0.011549480510941243, std=0.011214073033693242, nPoint=161861
camera 2: corr=0.022763553544634486, matched=133868, total=258325, fraction=0.5182154263040744
camera 4: 0 filters: mean=0.13742725991263743, std=0.29406412810899096, nPoint=267089
camera 4: 1 filters: mean=0.03322530312368369, std=0.07554521229175706, nPoint=233963
camera 4: 2 filters: mean=0.012957675344570312, std=0.01740890323778628, nPoint=215223
camera 4: 3 filters: mean=0.0073237974863557915, std=0.00673637299277873, nPoint=187617
camera 4: corr=0.01406017047913452, matched=159655, total=267089, fraction=0.5977595483153556
camera 8: 0 filters: mean=0.18983636922486904, std=0.3452892290040007, nPoint=219267
camera 8: 1 filters: mean=0.07397996868578144, std=0.1295160525258807, nPoint=192070
camera 8: 2 filters: mean=0.024127611062895588, std=0.04335302264504482, nPoint=162778
camera 8: 3 filters: mean=0.010089635934008755, std=0.012547536803761297, nPoint=144404
camera 8: corr=0.02263717273777005, matched=127114, total=219267, fraction=0.5797224388530878
registration.MultiCamera: Step 3: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.030865898724159034, weight=0.36426682216896883
	camnum=2, correspondence=0.022763553544634486, weight=0.2687148608547355
	camnum=8, correspondence=0.02263717273777005, weight=0.26605106019808944
	camnum=4, correspondence=0.01406017047913452, weight=0.16845167592862273
registration.MultiCamera: Step 4: camera 4, correspondence error 0.01406017047913452, overall correspondence error 0.024123472521746674
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=6.012191e-01, inlier_rmse=5.755844e-03, and correspondence_set size of 160579
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=6.012191e-01, inlier_rmse=5.755844e-03, and correspondence_set size of 160579
Access transformation to get result.
camera 1: 0 filters: mean=0.08662557728434213, std=0.16551215651781165, nPoint=255935
camera 1: 1 filters: mean=0.039880709274299354, std=0.04655976698744746, nPoint=230882
camera 1: 2 filters: mean=0.026002624358990915, std=0.023780290730512323, nPoint=202696
camera 1: 3 filters: mean=0.017976990098078703, std=0.0138587999016265, nPoint=146282
camera 1: corr=0.031835789999705204, matched=133228, total=255935, fraction=0.5205540469259773
camera 2: 0 filters: mean=0.12559533943164594, std=0.20564740776892507, nPoint=258325
camera 2: 1 filters: mean=0.039034154467309456, std=0.06721660765597363, nPoint=212322
camera 2: 2 filters: mean=0.020471987913061256, std=0.02342908091585363, nPoint=193868
camera 2: 3 filters: mean=0.011548404986942129, std=0.011205868043515447, nPoint=161860
camera 2: corr=0.022754273030457576, matched=133896, total=258325, fraction=0.5183238168973193
camera 4: 0 filters: mean=0.13737522935643365, std=0.2931434419411899, nPoint=267089
camera 4: 1 filters: mean=0.03288494890403441, std=0.07570251691552064, nPoint=233608
camera 4: 2 filters: mean=0.012608328016293704, std=0.017160278276197607, nPoint=215023
camera 4: 3 filters: mean=0.007062456488081894, std=0.0066233372709715264, nPoint=187526
camera 4: corr=0.01368579375905342, matched=159436, total=267089, fraction=0.5969395969133884
camera 8: 0 filters: mean=0.18907186145208518, std=0.34337227269028964, nPoint=219267
camera 8: 1 filters: mean=0.07361104579638122, std=0.12892966931346186, nPoint=191915
camera 8: 2 filters: mean=0.023974464712775416, std=0.04311412789177767, nPoint=162641
camera 8: 3 filters: mean=0.01002621313535229, std=0.012568856406781095, nPoint=144279
camera 8: corr=0.022595069542133382, matched=126797, total=219267, fraction=0.5782767128660492
registration.MultiCamera: Step 4: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.031835789999705204, weight=0.3756565032166122
	camnum=2, correspondence=0.022754273030457576, weight=0.26861006682357413
	camnum=8, correspondence=0.022595069542133382, weight=0.26549980957821573
	camnum=4, correspondence=0.01368579375905342, weight=0.16394756856218545
registration.MultiCamera: Step 5: camera 1, correspondence error 0.031835789999705204, overall correspondence error 0.024507540079677023
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.243753e-01, inlier_rmse=1.400334e-02, and correspondence_set size of 134206
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.243753e-01, inlier_rmse=1.400334e-02, and correspondence_set size of 134206
Access transformation to get result.
camera 1: 0 filters: mean=0.085744143195895, std=0.1629295893616391, nPoint=255935
camera 1: 1 filters: mean=0.039511715829330274, std=0.04595802880690485, nPoint=230678
camera 1: 2 filters: mean=0.025886279902809808, std=0.023489419406180306, nPoint=202767
camera 1: 3 filters: mean=0.017956762514358533, std=0.01356942217281289, nPoint=146094
camera 1: corr=0.03152618468717142, matched=133663, total=255935, fraction=0.5222536972278118
camera 2: 0 filters: mean=0.12687136098262647, std=0.20733285447164593, nPoint=258325
camera 2: 1 filters: mean=0.03937345673642397, std=0.067556080971456, nPoint=212229
camera 2: 2 filters: mean=0.020832455215532453, std=0.023624509888987198, nPoint=194056
camera 2: 3 filters: mean=0.011799441161706633, std=0.011456482620224587, nPoint=161692
camera 2: corr=0.02325592378193122, matched=133740, total=258325, fraction=0.5177199264492403
camera 4: 0 filters: mean=0.13614773009717254, std=0.29084277080503623, nPoint=267089
camera 4: 1 filters: mean=0.032746877794534585, std=0.07493767861597596, nPoint=233744
camera 4: 2 filters: mean=0.012676105267106623, std=0.01725133933258455, nPoint=215073
camera 4: 3 filters: mean=0.007098475064730394, std=0.006650654639468999, nPoint=187561
camera 4: corr=0.013749129704199392, matched=159479, total=267089, fraction=0.597100591937519
camera 8: 0 filters: mean=0.18906210509871343, std=0.3433760976908357, nPoint=219267
camera 8: 1 filters: mean=0.07359989895358737, std=0.128931325738454, nPoint=191915
camera 8: 2 filters: mean=0.02395911555900265, std=0.04310054457832315, nPoint=162639
camera 8: 3 filters: mean=0.01003126268730873, std=0.012575232086168385, nPoint=144316
camera 8: corr=0.022606494773477114, matched=126814, total=219267, fraction=0.5783542439126726
registration.MultiCamera: Step 5: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.03152618468717142, weight=0.3721059849319554
	camnum=2, correspondence=0.02325592378193122, weight=0.2745048520886307
	camnum=8, correspondence=0.022606494773477114, weight=0.26563709066924135
	camnum=4, correspondence=0.013749129704199392, weight=0.16471000269823738
registration.MultiCamera: Step 6: camera 1, correspondence error 0.03152618468717142, overall correspondence error 0.0244992751064434
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.222303e-01, inlier_rmse=1.388377e-02, and correspondence_set size of 133657
Access transformation to get result.
RegistrationComputer_ICP_Point2Point: RegistrationComputer_ICP_Point2Point result: RegistrationResult with fitness=5.222303e-01, inlier_rmse=1.388377e-02, and correspondence_set size of 133657
Access transformation to get result.
camera 1: 0 filters: mean=0.08570856522150984, std=0.16282013235384463, nPoint=255935
camera 1: 1 filters: mean=0.03950176109090798, std=0.04594035830972884, nPoint=230675
camera 1: 2 filters: mean=0.025884192435114257, std=0.02348344326392318, nPoint=202769
camera 1: 3 filters: mean=0.017959348605171687, std=0.013564917302260449, nPoint=146057
camera 1: corr=0.03152426590743214, matched=133653, total=255935, fraction=0.5222146248070799
camera 2: 0 filters: mean=0.12693157241935005, std=0.20743039849530176, nPoint=258325
camera 2: 1 filters: mean=0.03937401916220994, std=0.0675644555255504, nPoint=212220
camera 2: 2 filters: mean=0.020838858484963026, std=0.02362778629463241, nPoint=194067
camera 2: 3 filters: mean=0.01180275307823051, std=0.011458399774637492, nPoint=161691
camera 2: corr=0.023261152852868002, matched=133735, total=258325, fraction=0.5177005709861608
camera 4: 0 filters: mean=0.13608604503595922, std=0.2907401368034397, nPoint=267089
camera 4: 1 filters: mean=0.03273966268685515, std=0.07490889046678796, nPoint=233754
camera 4: 2 filters: mean=0.012673769895350047, std=0.017246906948641092, nPoint=215072
camera 4: 3 filters: mean=0.007096693873413673, std=0.006646845878896774, nPoint=187556
camera 4: corr=0.013743539752310447, matched=159497, total=267089, fraction=0.5971679852034341
camera 8: 0 filters: mean=0.18906203631999458, std=0.3433761397946863, nPoint=219267
camera 8: 1 filters: mean=0.0735998203724273, std=0.1289313834808062, nPoint=191915
camera 8: 2 filters: mean=0.023959022832760257, std=0.043100641604935076, nPoint=162639
camera 8: 3 filters: mean=0.010030749932079227, std=0.012574562557327482, nPoint=144315
camera 8: corr=0.02260531248940671, matched=126815, total=219267, fraction=0.578358804562474
registration.MultiCamera: Step 6: per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.03152426590743214, weight=0.37208097885016417
	camnum=2, correspondence=0.023261152852868002, weight=0.27456570456985097
	camnum=8, correspondence=0.02260531248940671, weight=0.2656233765227408
	camnum=4, correspondence=0.013743539752310447, weight=0.1646445880448745
registration.MultiCamera: Step 6: Giving up: went only from 0.03152618468717142 to 0.03152426590743214
registration.MultiCamera: After 6 steps: overall correspondence error 0.02449924277602858. Per-camera correspondence, ordered worst-first:
	camnum=1, correspondence=0.03152426590743214, weight=0.37208097885016417
	camnum=2, correspondence=0.023261152852868002, weight=0.27456570456985097
	camnum=8, correspondence=0.02260531248940671, weight=0.2656233765227408
	camnum=4, correspondence=0.013743539752310447, weight=0.1646445880448745
	camindex=0, change=0.004923383770767776
	camindex=1, change=0.0035834263613848762
	camindex=2, change=0.004234437482021266
	camindex=3, change=0.0047952394388900855
registration.MultiCamera: Voxelizing with 0.03464716140173069: point count 31566, was 1000616
registration.MultiCamera: Pointcounts per tile, after voxelizing:
	tile 0: 31566
	tile 1: 6022
	tile 2: 6626
	tile 3: 644
	tile 4: 7621
	tile 5: 903
	tile 6: 198
	tile 7: 113
	tile 8: 7565
	tile 9: 25
	tile 10: 638
	tile 11: 15
	tile 12: 573
	tile 13: 56
	tile 14: 349
	tile 15: 218
cwipc_register: fine aligner ran for 107.283 seconds
cwipc_register: analyzer ran for 14.915 seconds
cwipc_register: Sorted correspondences after fine calibration
	camnum=1, correspondence=0.03152426590743214, weight=0.37208097885016417
	camnum=2, correspondence=0.023261152852868002, weight=0.27456570456985097
	camnum=8, correspondence=0.02260531248940671, weight=0.2656233765227408
	camnum=4, correspondence=0.013743539752310447, weight=0.1646445880448745
cwipc_register: Saved pointcloud and cameraconfig for step4_after_fine

@jackjansen
Copy link
Contributor Author

Serious improvements. Here are the before/after correspondences (for the above dataset, and a similar capture):

cwipc_register: Sorted correspondences before fine calibration
	camnum=1, correspondence=0.029772669903727095, weight=0.32975381448799584
	camnum=4, correspondence=0.015593297675407158, weight=0.17546715061016732
	camnum=2, correspondence=0.014681153906851572, weight=0.1639654860227558
	camnum=8, correspondence=0.012609081220802731, weight=0.13812289235045297
cwipc_register: analyzer ran for 2.103 seconds
cwipc_register: Sorted correspondences after fine calibration
	camnum=1, correspondence=0.011843114105968925, weight=0.13234881827053135
	camnum=2, correspondence=0.007111442839985053, weight=0.07982065666339981
	camnum=4, correspondence=0.005414429270679016, weight=0.06188110117293287
	camnum=8, correspondence=0.004999972304687699, weight=0.05537202835582819

@jackjansen
Copy link
Contributor Author

I think the only improvement that could still be done is to do different steps (the outer algorithm):

  1. Start with a synthetic floor (optionally). Call this the "current set"
  2. Find the camera that is nearest to the current set.
  3. Align it to the current set.
  4. add it to the current set.
  5. Repeat until there are no cameras left.

Maybe after this we could do one more pass over all the cameras but I'm not sure this is worthwhile.

@jackjansen
Copy link
Contributor Author

It turns out that that previous comment is indeed needed, for some situations. Because we still sometimes have the issue that the cameras are "pairwise aligned" (the problem we saw last year with the boxes datasets).

@jackjansen
Copy link
Contributor Author

Unfortunately this doesn't work.

The problem is that the initial step, aligning the first two cameras, will over-enthusiastically rotate the "camera position" around the origin, to achieve the best possible overlap between the two point clouds. Which is of course completely wrong.

A possible solution may be to limit the target point cloud to just the points that could conceivably be matched to the source point cloud, but not sure that will fly.

@jackjansen
Copy link
Contributor Author

Hmm, thinking out loud: maybe we should use a much smaller correspondence to the alignment step, basically reducing the points taken into account far too much, but then at least we know that the only points taken into account are points that are probably correct...

@jackjansen
Copy link
Contributor Author

It may be that our analyzer functions are still too optimistic. I'm seeing a case where the distribution plots of the distance clearly doesn't correspond to reality.

@jackjansen
Copy link
Contributor Author

Indeed! And it's the floor that is making it so optimistic! If I remove the floor I am getting much more realistic numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

6 participants