-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory management issues in DART BoxedLcpConstrainedSolver? #48
Comments
Original comment by Addisu Z. Taddese (Bitbucket: azeey, GitHub: azeey). Thanks Jaldert Rombouts (Jaldert) for the reproducible example. I looked at your .sdf file to see if there were things that were not setup properly before delving into DART. I found that doing the following prevents the crash from happening:
Here is the new file: https://gist.github.com/azeey/cad3bb49f0e096f1e90b8f9adfd3ce4b And this is the result: That being said, I don't think ign-gazebo should be crashing given the .sdf file you provided, so we'll keep this issue open and investigate the cause of the crash. |
Original comment by Michael Grey (Bitbucket: mxgrey, GitHub: mxgrey). If there’s a collision constraint violation between static objects (i.e. a fundamentally unsolvable scenario), I’m not especially surprised that a catastrophic crash is occurring. I can imagine that kind of scenario may produce NaN values in the simulation state, and those NaN values may spill out to other parts of the program, eventually causing some kind of undefined behavior or unhandled excpetion. It would be preferable for the simulation to kindly explain what the problem is, but it’s possible that the amount of sanity checks needed throughout the simulation pipeline to figure that out would be prohibitively expensive. Maybe dartsim could have more assertions to check for NaN values. |
Original comment by Jaldert Rombouts (Bitbucket: Jaldert). Thank you very much for looking into this and for your helpful response! I can confirm that your suggestions fix the issue for the model I shared. Now I'm trying to understand exactly what parameters are relevant and why, so that I can avoid similar mistakes in future models. To do this systematically, I've instrumented a templated model where I'm varying the following:
Your recommended configuration is:
Running over all 36 (3 x 2 x 2 x 3) combinations yields the following table (successes bolded):
Footnotes: (1) Fails after a few seconds. (2) Fails almost immediately. (3) Fails after/around box hits. (4) Recommended configuration for the original model. Conclusions:
This gives rise to some follow-up questions:
Your hypothesis sounded valid to me, so I also ran the configurations with a z-offset of -5cm with the expectation that the model would always crash. However, that is not what I observed. In fact, in this case, inertial can be left unspecified, and thus only fixed_joint and n_rollers affect the success/failure of running the world. This shows that it is possible for a simulation with fundamentally unsolvable collision constraints to run fine. From a user perspective, auto-checking for common violations and auto-generating (missing) parameters such as inertial matrices could make modelling more fool-proof. An example: (Py)Bullet by default recomputes the inertial matrices when loading from URDF/SDF: loadURDF see the "URDF_USE_INERTIA_FROM_FILE" flag. I'm guessing this was done because many model files contain mistakes in the inertial parameters. The downside of this is that those matrices might be wrong in more subtle ways (e.g. for non-uniform mass distribution) that keep the simulation running but produce unrealistic results. Generating some output on exactly what gets auto-computed could be a solution to that. I have attached conveyor_with_box_experiments.tgz in a separate comment (couldn’t find a way to attach it to this comment). This archive contains all SDFs for the table. I'm happy to provide any further information! |
Original comment by Jaldert Rombouts (Bitbucket: Jaldert).
SDFs for running experiments in table (as well as negative z-offsets not shown in table). |
I believe this has been fixed by gazebo-forks/dart#6, which is the fork of DART used by ign-gazebo. I ran all the SDF files in |
Original report (archived issue) by Jaldert Rombouts (Bitbucket: Jaldert).
The original report had attachments: conveyor_with_box_crash.gif, conveyor_with_box_experiments.tgz
Prerequisites
Put an X between the brackets on this line if you have done all of the following:
Description
Thanks for building Ignition Gazebo. I've been experimenting with it and I really like the new architecture.
One of my experiments was building a conveyor model and system plugin. The model consists of a box with a set of rollers (cylinders) with revolute joints on top. I've written a custom system plugin to set
JointVelocityCmd
s for all these rollers.The model runs fine when it is alone in the world - I've left it running for hours without issues.
However, I’m running into issues when I drop a box on top of the running conveyor: I quickly get memory errors like
free(): invalid next size (fast)
and occasionally a segmentation fault ordouble free or corruption (out)
. Interestingly, this issue seems to require a combination of the conveyor running and the box moving along it. I tried variations where the model is not running and the box is dropped/placed at similar poses where the simulation crashes and I didn't manage to crash it in this manner.Taking the GDB backtrace at face value, it seems to be an issue in DART: In particular it seems to happen in
constraint::BoxedLcpConstraintSolver::solveConstrainedGroup(dart::constraint::ConstrainedGroup&) () from /usr/lib/x86_64-linux-gnu/libdart.so.6.10
(link)(see attachment for high resolution version).
I was not entirely sure if this issue should be posted here or on
ign-physics
. I’ve posted it here because reproduction requiresign-gazebo
, but I’m also happy to open the issue elsewhere.Steps to Reproduce
RevoluteConveyorController.so
library2. Run the
conveyor_with_box.sdf
example
Expected behavior:
The box should drop onto the rollers, move along it and finally fall off the end. The simulation should keep running.
Actual behavior:
The box drops, and starts moving along conveyor, then ignition gazebo crashes with memory errors like
free(): invalid next size (fast)
and occasionally a segmentation fault ordouble free or corruption (out)
.Reproduces how often:
Always. There is variation in the exact timing to crash (looking at the number of simulation iterations) as well as the exact error (see detailed description above).
Versions
Additional Information
The text was updated successfully, but these errors were encountered: