-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment of multi-camera pointclouds needs work #18
Comments
Current best procedure (copied from Slack): I think I've just done the best calibration we've ever done. Keeping note here so we can try again (in other words:
|
I think we should start looking at This also gives us camera positions, so we can then determine the camera ordering for the next step (which pairs of cameras we should align). We could then capture a 3D object (maybe the boxes, but possibly a person is good enough). I have the feeling that we could use the histogram of point-to-point distances to determine how well the coarse calibration worked: I'm expecting a plateau on the left of the histogram, then a drop, then a very long tail. Somewhere along the drop would seem to be a good value for the We then use cumulative multi scale ICP or something, and after that recompute the histogram. Then I think we repeat with the other camera pairs. Question is what to do after all pairs have been done, because possibly the last calibration has broken the first one. Another option could be to first compute the But I do think that after we've finished calibration we should repeat the calculation of @fonskuijk I would really like your opinion on this idea... |
Came up with a wild plan (thanks @Silvia024 !). What if we take synthetic point cloud, and then create 4 point clouds from that (North, East, South, West), which each of the new point clouds having half the points of the original (basically the ones that can be seen from that direction). Now we can slightly perturb each of the NESW point clouds, and recombine them into a single "4-camera" point cloud. This we can now present to So this should allow us to see what the magic numbers are that we are looking for. Moreover, this scheme (or variations of it, for example introducing some noise) could also be used to do automatic testing of our point loud registration algorithms (once we have written them:-). |
That was a bit of a disappointment, so far. Or actually maybe it just showed our erronous thinking:-) Implemented the generation of registration-test pointclouds, with a settable offset for camera 2 and 4. (1 and 3 will be correct). Created pointclouds with an error of The results are in https://github.com/cwi-dis/cwipc_test/tree/master/pointcloud-registration-test I could say something (based on the data for these pointclouds) like: "As soon as more than 15% of the points come from two cameras our voxelsize is larger than the error" but this would just be data-fitting. @Silvia024 can you have a look at the data? |
Another wild idea. Comments please. For each voxelsize we voxelize the whole pointcloud, just as we do now. We now compare the two numbers. This comparison is a measure for how much camera-combining has happened.
|
Edit: this did not work. I experimented with it by creating the generated point clouds (0.03 and 0.003) with only two cameras (dropping the other two). The curves are just as meaningless as for the full 4-camera cloud. I am pretty sure that the same is true for the previous wild idea. |
maybe tomorrow we can chat better about this? btw I just found this: IntelRealSense/librealsense#10795 |
I found two interesting papers...
|
Updated two previous comments (the two wild ideas): they turned out not to work. I have also experimented with changing the "next cellsize factor" from I now think that the whole idea of using voxelization at successively smaller sizes and looking for a discontinuity simply doesn't work.
@Silvia024 @troeggla please up-vote this comment if you agree that voxelization is a dead end. Then I can remove all the generated data and spreadsheets over in |
My suggestion for the next try is to compute I think the KD-Tree is the data structure we want. And it appears to be supported in NumPy, example code is in https://medium.com/@OttoYu/exploring-kd-tree-in-point-cloud-c9c767095923 |
The idea of using KDTree and plotting a histogram of the mimic distances looks very promising. There's a Jupyter notebook in cwipc_test, I'm going to convert it to a script. |
the theory seems correct to me :) |
All the graphs are now in cwipc_test. They look mostly usable. I'm going to change the Y axis from being counts to being fractions, that should make them even easier to interpret (I hope). |
For reference in further discussion below: here is the current state of the graph for the boxes in vrsmall (with one camera about 1cm off): And here is one of the Jack point clouds (the best one, the sideways one): And here is the one for the generated point cloud, with a |
First thing I would like to do is convert each of these lines into a number Once we have this number we can do two things with it:
|
Idea: we should look at the derivative of these lines (which is actually the distance histogram data) and the second derivative. Possibly after smoothing the lines a little. The second derivative should start positive, then go negative, then go positive again. Maybe that last point is the number we're interested in. |
Unrelated idea (@Silvia024 comments please): we are currently looking at the Would it be better to do a per-camera comparison to all other cameras combined? I.e. not compute and graph the distance per camera pair, but in stead look at the distance of the points of camera That should extend the "tail" of the histogram, probably making it more linear-ish. Also, the resulting number |
Implemented, and it seems to produce much better results. So much better that We need to check that we are not looking at bullshit data. Here are the graphs (for single-camera-to-all-others) that are for the same dataset as the pair graphs of yesterday: |
@jackjansen, I can't access to this link. It says page not found. Maybe do I not have the right to access it? |
That link is dead (the gitlab site disappeared when we stopped paying for it). |
Paused working on the aruco markers for now. Implemented a manual coarse calibration algorithm (basically the same as what we have: select the four coloured corners of the calibration target). This works, the resulting pointcloud and the correspondence graph are in Here is the graph: The correspondence errors are too optimistic, but visual inspection of the graph shows that something like But: these pointclouds are so big (500Kpoint per camera) that the analysis is very expensive: it takes about a minute on my M1 Mac, and I didn't wait for it on the Intel Mac. That's a shame, because for our common use case (capturing in a room where the walls are close enough to be captured by the cameras) this would actually be a very useful test capture... |
In case we decide to go for detecting the Aruco markers in the RGB image and then finding the corresponding points in the point cloud: here are some interesting links about how the conversion of image pixel coordinates to 3D points can be done using librealsense API's: IntelRealSense/librealsense#8221 |
Some progress with coarse calibration. I'm manually positioning the pointcloud in the open3d viewer so that the virtual camera is approximately where the physical camera was. I then grab the RGB and D images. I can now detect the aruco markers in the RGB image. But this is where the good news stops: the next step is converting the I think I'm doing all the right things: I'm taking the depth from But I always end up with weird origins. I've tried transposing the extrinsic matrix, I've tried mirroring |
See isl-org/Open3D#6508 for a decent description of the open3d coordinate system (RH y-UP) and its idiosyncrasies with respect to lookat and forward. |
The problem may be that I "invert" the affine transform (the Here is an explanation: http://negativeprobability.blogspot.com/2011/11/affine-transformations-and-their.html |
And I found https://stackoverflow.com/questions/2624422/efficient-4x4-matrix-inverse-affine-transform which says a similar thing. Also, looking at the source code for open3d |
Automatic coarse calibration based on Aruco markers is working. It's actually working so well that I have merged back into |
Multi-marker coarse alignment, for camera positions where not all cameras can see the |
So, back to fine calibration or actually first to the analysis of the current calibration. I'm working with the The most informative graph for this dataset (but note the italics) is the pairwise graph: The "camera numbers" here are the As a human we can estimate the correspondences of the pairs:
We can also see that the correspondence errors that the current "algorithm" (but really "quick hack" is a better term) has come up with are wildly wrong. Not surprising: the current "algorithm" works by finding the peak in the histogram and then move right until we get below a value that is less than I will experiment with |
Mean and stddev by themselves are not going to work. Here are the results for the graph above:
The mean for the "good pairs" (6 and 9) is far too high. And that is pretty logical, when you think about it: the long tails have an inordinate effect on the mean. Next thing to try: first compute mean and stddev. then throw away all distances that are larger than (wild guess)
The idea is that for the "bad pairs" this will throw away less of the points, but for the "good pairs" it will throw away more points. |
Tried that. Also tried running the filtering multiple times, to see how the mean and stddev behave. Used the bracketing filter
Here are the results:
This seems to be going in the right direction: the "bad pairs" (opposing cameras) have their |
Partial success. That is to say: this works pretty well for camera-pair measurements on the boxes: These are pretty beleivabe numbers! Unfortunately it does not work well at all for the one-to-all-others measurements: I think the problem is that this algorithm throws away any points that it can't match (which, in case of this dataset, includes the mismatched "edges that are sticking out". Let's first check how the pair-wise measurements work on the other datasets. |
That didn't work very well. I've now made the pair-wise measurement symmetric but this needs work: at the moment it is far too expensive. And it is also too aggressive in trying to put as many points into the overlapping set as it can. Can be seen with the loot datasets. We should somehow re-enable the |
For future reference: when we get back to finding the "best" algorithm to align the pointclouds we should look at point-to-plane ICP with a robust kernel. From https://www.open3d.org/docs/latest/tutorial/pipelines/robust_kernels.html#Vanilla-ICP-vs-Robust-ICP I get the impression that the robust kernel is a way to deal with noise. The referenced page uses generated noise, but of course our sensors are also noisy... |
Copied from Slack: Folks, in your research of registration algorithms, have you come across any that allow "pinning" of one of the variables? I.e. ask the algorithm to find the optimal transformation but specifying, for example, that the y-translation must be zero?
Actually, thinking a bit more, we not only want to pin the y-translation to 0 but also the x-rotation and z-rotation. So the only free variables should be y-rotation, x-translation and z-translation. |
We might want to change the loss function as to only consider a 2D error - which would effectively mean that in every iteration, the algorithm would be forced to change only the parameters that are considered in the loss function, because the others would have no impact in the error. There might be other ways of writingit as an optimization problem, though; we should check it out |
Inspecting this issue again after half a year of inactivity, but a lot of actually using the current registration setup in production. The comment quoted above (from 11-Dec-2023) seems to be the main thing that is bothering us most at the moment: Often, when running The "solution" we are currently using is to simply try again with the subject human in a different pose, and hoping for the best. Fixing this, or at least showing the operator something (for example a graph of the p2p distance distribution) from which they can tell this has happened is at the moment of paramount importance. |
Once we have addressed the issue above (the bad numbers coming out of our analysis) my feeling is that we should move to a "mixed" upper strategy. Right now our upper strategy is either pairwise or one-to-all-others, but maybe we should first do one round of pairwise, and after that a round of one-to-all-others. If we do the pairwise round in the right order (i.e. most overlapping pair first) I think that should get us out of the "local minimum problem" with the boxes. The right order should be easy to compute: for each pair, compute the upper bound of the percentage/fraction of points that could possibly overlap. High-to-low is the right order. |
Finally getting back to this. Started by forwarding the |
We're going to need some debug tools to see what is happening. One is #136 so we can visually inspect the different tiles at the same time. Another is that when we are paused and we use a command that should change our view (such as the colorise option from #136, or selecting a different tileset to see with the number commands) the point cloud is redrawn with the new options. We may also want an option to |
Starting to experiment with First thing noticed is that the radius-filter is messing things up: the back of Thong isn't captured. Turned off the radius filter. Then I noticed that the "moving Aruco Marker" problem is back. Need to get rid of that too. |
Fixed those issues by modifying cameraconfig. The new one is attached here: With this cameraconfig we took a capture just after Thong clapped his hands (slightly after And here is the histogram plot of the final resulting distances between each camera and all others: We can see a number of (completely different) things from this plot:
We need to fix the first bullet. Then we need to decide what we do first:
For reference, here is the log of the steps the algorithm took. As can be seen camera 4, which appears to be the troublemaker from human inspection of the graphs, was never re-aligned, because it always appeared to have the "best" result for its alignment:
|
Serious improvements. Here are the before/after correspondences (for the above dataset, and a similar capture):
|
I think the only improvement that could still be done is to do different steps (the outer algorithm):
Maybe after this we could do one more pass over all the cameras but I'm not sure this is worthwhile. |
It turns out that that previous comment is indeed needed, for some situations. Because we still sometimes have the issue that the cameras are "pairwise aligned" (the problem we saw last year with the boxes datasets). |
Unfortunately this doesn't work. The problem is that the initial step, aligning the first two cameras, will over-enthusiastically rotate the "camera position" around the origin, to achieve the best possible overlap between the two point clouds. Which is of course completely wrong. A possible solution may be to limit the target point cloud to just the points that could conceivably be matched to the source point cloud, but not sure that will fly. |
Hmm, thinking out loud: maybe we should use a much smaller correspondence to the alignment step, basically reducing the points taken into account far too much, but then at least we know that the only points taken into account are points that are probably correct... |
It may be that our analyzer functions are still too optimistic. I'm seeing a case where the distribution plots of the distance clearly doesn't correspond to reality. |
Indeed! And it's the floor that is making it so optimistic! If I remove the floor I am getting much more realistic numbers. |
We need to fix the alignment once and for all.
The original issue is https://gitlab.com/VRTogether_EU/cwipc/cwipc_util/-/issues/41but this link is dead now.There is an old repo for experiments at https://github.com/cwi-dis/pointclouds-alignment
New experiment data is at https://github.com/cwi-dis/cwipc_test/tree/master/pointcloud-registration-test
Current plan of attack:
cameraconfig.json
. Also record there the current misalignment, because it is a good value for voxelization (later, during production).The text was updated successfully, but these errors were encountered: