-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update versions of ccd, fcl and octomap. Avoid their installation and hide their binary symbols from libdrake #7417
Conversation
WORKSPACE, line 195 at r1 (raw file):
Missing space causing havoc with the bazel codestyle check? Comments from Reviewable |
Are upstream not responsive to patches anymore? |
Last PR merge is from May and there were only two merges during this year, the repo seems not to have too much activity. We can try to ask Daniel Fiser if he needs help maintaining the repo, if TRI is interested into getting the changes quickly into upstream. |
I had no issues with getting some changes in quickly a while back. There are just not too many people making PRs, that is all. |
With my platform reviewer hat on, indeed I would want to see affirmative evidence of upsteam being intransigent (i.e., either ignoring or rejecting pull requests) before I approved using a RobotLocomotion-specific fork. |
This reverts commit b70d4d4.
Please ignore the last two commits. I will check with @stonier if we want to use the fork or wait for upstream to incorporate changes. |
The fork is not because upstream is necessarily intransigent. My usual process through years of working with many open source projects whilst trying to maintain velocity on a product has been:
Sending upstream a PR might often move up the list if I want to elicit feedback from upstream earlier. Resync might occur much later if I want to delay handling/testing against additional updates coming in from elsewhere in the interim. If so, pull the review modifications back into the fork. The primary point here is to not bottleneck development and allows more extensive local testing against the changes. Do we have any reason to bottleneck development while we wait an indeterminate amount of time for review? |
About two years ago almost everything that was external was on a fork and as projects diverged we spent inordinate amounts of time to get back on master. We have almost done that save for libbot and pybind11. The pybind11 example is reasonable one as that diverged and we have had to re-write everything and it has taken probably four times as long as it would have. Forks also do not exist in a vacuum that prevents others using that fork. In the other remaining fork that we have, libbot2, other projects picked up the fork, nothing ever got upstreamed, and removing the fork would cause much pain, so we are probably stuck with it, which is unfortunate. |
The problem with divergence is not the fork itself. It is in the diligence by the group managing the fork. Whilst we we were building a fleet of robots with a software ecosystem far vaster than Drake, we typically had 10-20 forks of open source software at any time. The prospect of maintaining those in perpetuity was enough to motivate changes going back upstream and required little nagging and only minimal guidance from myself. When you work on the bleeding edge with deadlines and require critical fixes there is no other way. Perhaps the only thing that is permitting it here is that Drake development is meandering along with no critical deadlines. At some point this will/should become necessary and we should have a habit for good practice then. |
Jamie's history is on point. I think there is definitive historical evidence of developer drag from maintaining and (trying to) merge up a RobotLocomotion fork for all time. To the question of bottlenecks, I have no goal of slowing down development. A personal fork of Drake can refer to a personal fork of ccd. A shared feature branch of Drake can point to a shared feature branch of ccd. All of that can get CI testing. I don't think there is friction there. In the unusual and to-be-shunned event that manual testing is required, interested parties can check out and explore the branch. But really, CI is king. If CI is passing, that should really be good enough. If its not, then we need to enhance CI. (At some point, UIs and things aren't covered by CI, but relatively few of our externals relate to UIs; CCD certainly does not.) In the event that we need a small tweak on a few-hours timeline instead of few-days that upstream might be to accept a PR, we have allowed Drake to point off-master of upstream in the past, and I think we could do so again. In those cases, there is an upstream PR filed to point at, and if its not merged in a timely manner then we can react and use off-master of upstream, while reviewing the patch ourselves. |
(once again) there is a universe outside of Drake and it is these people that will be bottlenecked. |
Can you expand on that? I don't understand. How does having a RL fork of CCD help potential Drake contributors or users? |
The goal of this work is to enable Drake-ROS compatibility. There are groups that need to create their own plans around this and have even started work on it already. I would prefer to give them a deterministic ETA on when they can expect it to arrive rather than 'unknown pending assistance from people outside our group'. I would also like to see this get more testing from a larger audience than private branch maintainers sooner than later. Managing resync with upstream is not a difficult, nor time consuming process so long as you do not let it slide. I have done this many, many times in the past. |
This is a strawman argument. I said that if upstream is too slow to get things onto master, we can fall back to using our own fork, either in the short term for a hotfix, or to a RL fork if its an ongoing pain point. My request (and I think Jamie's) is that we treat that as a second-best choice, and at least try to upstream changes immediately. There is an argument for doing upstream PRs in parallel to Drake moving ahead with the external edits, to reduce development long poles, but we have to do all of the same work anyway so the total effort is similar, possibly more when dealing with parallelization. And the difference will be minimal when upstream is fast.
What testing can or cannot happen is orthogonal to our git structure choices. |
If this can be integrated and picked up by users, they are going to discover unanticipated problems earlier rather than later. This, in my experience, has saved having to make repeat upstream patches. |
That is certainly a possibility. I am willing to take that gamble and trust our CI. |
Yes, I concur. |
@j-rivero if it hasn't been picked up by Monday next week (don't have high hopes given that their a 1+ month old pull requests hanging) then let's merge here and review next steps with respect to upstream. |
The only meaningful effect of this PR is to pull in ~30-40 new commits from upstream. If we change this PR to point at upstream github (danfis/libccd at commit 64f02f741ac94fccd0fb660a5bffcbe6d01d9939), it will have the same effect as currently, and could go in immediately. |
The more pressing problem is that PR breaks two thirds of their builds. Not necessarily anything to do with intransigence. |
Aye, agreed. |
There is a regression in FCL in function fcl::distance() but Drake is not currently using that function so the test can be safely removed. Full reference: RobotLocomotion#7417 (comment)
The reasoning makes sense to me. I've removed the test, let's see the result of the CI. Thanks Andres. |
The CI looks good now. I've reported the bug upstream flexible-collision-library/fcl#244 |
@jose Ready for feature review? @jamiesnape is a good candidate. Comments from Reviewable |
Ready. Excuse me if the question is to stupid but Do I need to interact with |
You should assign me via the GitHub interface (I have self-assigned now). |
Reviewed 2 of 13 files at r4, 5 of 12 files at r5, 3 of 5 files at r7, 2 of 6 files at r8, 2 of 3 files at r9. tools/workspace/generate_export_header.bzl, line 30 at r9 (raw file):
Should tools/workspace/fcl/fcl.BUILD.bazel, line 60 at r9 (raw file):
This is not captured by the glob? Comments from Reviewable |
Good catch. The problem is that ctx.attr.export_macro_name default value include the name of the library so we can not use it directly. What I did in d649028 was to implement
Looking at the code I agree with you but for some reason I'm getting this error if I remove the explicit line and just leave the glob. |
I see, it is because it is a generated file. Either pass the label instead of the file name or add a comment. |
Reviewed 1 of 1 files at r10. Comments from Reviewable |
+@SeanCurtis-TRI for platform review. |
+(status: curate commits before merging)
Reviewed 2 of 13 files at r4, 4 of 12 files at r5, 3 of 5 files at r7, 2 of 6 files at r8, 2 of 3 files at r9, 1 of 1 files at r10. Comments from Reviewable |
Comment added in e152671 |
Reviewed 1 of 1 files at r11. Comments from Reviewable |
Review status: all files reviewed at latest revision, 1 unresolved discussion. tools/workspace/fcl/fcl.BUILD.bazel, line 54 at r11 (raw file):
BTW typo: drop "it". Comments from Reviewable |
ouch, thanks b508052 |
Reviewed 1 of 1 files at r12. Comments from Reviewable |
Reviewed 1 of 1 files at r12. Comments from Reviewable |
Following @stonier instructions this PR changes the build system to use the latest commit in the recently created fork.The automated testing has been waiting in the OSRF buildfarm for some hours but it should appear at: https://build.osrfoundation.org/job/drake-ci-pr_any-xenial-amd64/42/ when launched.Update 11/28:
The PR affects to the libccd, libfcl and octomap versions present in Drake:
libccd version has been updated to current tip in upstream repo from current v2 in Drake (full list of changes) The list shows that most of the changes are related to build system. As far as I can see, Optimize ccdQuatRotVec danfis/libccd#14 is the only PR affecting features behaviour.
octomap has been updated from v1.7.2 to v1.8.1 (full list of changes and changelog) which is the version in ROS Kinetic.
libfcl has been updated from an upstream commit done after 0.5 release done in Jan 2017 to current master (full list of changes). Most of the changes seems related to build system and tests but there are at least two of them affecting code behaviour: correct nearest points from distance() and problem in signed distance for the geometry shape
All installations of the three libraries have been removed from the code. The binary symbols are totally removed from the public ABI for libfcl and libccd (using the visibility mechanism implemented upstream). Octomap symbols are still in there but should match the ones present in ROS. This PR does not fix the case of mixing libdrake with other octomap versions different than 1.8.1 in a third party software.
Tested in the OSRF buildfarm: https://build.osrfoundation.org/view/All/job/drake-ci-pr_any-xenial-amd64/76/consoleFull
This change is