Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underflow in costmaps lead to segmentation fault in planner server #2835

Closed
jonipol opened this issue Feb 28, 2022 · 9 comments
Closed

Underflow in costmaps lead to segmentation fault in planner server #2835

jonipol opened this issue Feb 28, 2022 · 9 comments

Comments

@jonipol
Copy link
Contributor

jonipol commented Feb 28, 2022

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • ROS2 Version:
    • Galactic binaries (ros:galactic - docker)
  • Version or commit hash:
    • 1.0.8-1focal.20220203.051427
  • DDS implementation:
    • Cyclonedds

Steps to reproduce issue

1. Have obstacle layer in global costmap with laser scan
2. Move robot near the edge of costmap (I have got this error when robot is near the edge of costmap on position with negative y coordinates.)
3. Planner might crash

Expected behavior

Costmaps are generated, if goal is given the plan is made, execution continues normally

Actual behavior

Planner server crashes to segmentation fault

Additional information

So far I have managed to reproduce this crash on simulation on two different worlds and on our physical robot.
image

This segmentation fault seems to happen when the offset in costmap_2d.hpp, on bresenham2D function, is updated with offset_b being negative. This can lead to underflow of the unsigned int offset, which then leads to segmentation fault when at(offset) is called.

[planner_server-19] Stack trace (most recent call last) in thread 2892:
[planner_server-19] #20   Object "", at 0xffffffffffffffff, in 
[planner_server-19] #19   Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7f81a1b2a292]
[planner_server-19] #18   Source "/build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c", line 477, in start_thread [0x7f81a1899608]
[planner_server-19] #17   Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28", at 0x7f81a1cedde3, in 
[planner_server-19] #16   Source "/usr/include/c++/9/thread", line 195, in _M_run [0x7f81a16bf509]
[planner_server-19]         192: 	{ }
[planner_server-19]         193: 
[planner_server-19]         194: 	void
[planner_server-19]       > 195: 	_M_run() { _M_func(); }
[planner_server-19]         196:       };
[planner_server-19]         197: 
[planner_server-19]         198:     void
[controller_server-18] [WARN] [1645688710.606256448] [local_costmap.local_costmap]: /camera_4/slope_filtered_3d buffer updated in 0.56s, it should be updated every 0.50s.
[planner_server-19] #15   Source "/usr/include/c++/9/thread", line 251, in operator() [0x7f81a16c3837]
[planner_server-19]         248: 	{
[planner_server-19]         249: 	  using _Indices
[planner_server-19]         250: 	    = typename _Build_index_tuple<tuple_size<_Tuple>::value>::__type;
[planner_server-19]       > 251: 	  return _M_invoke(_Indices());
[planner_server-19]         252: 	}
[planner_server-19]         253:       };
[planner_server-19] #14   Source "/usr/include/c++/9/thread", line 244, in _M_invoke<0> [0x7f81a16c8cdd]
[planner_server-19]         241: 	template<size_t... _Ind>
[planner_server-19]         242: 	  typename __result<_Tuple>::type
[planner_server-19]         243: 	  _M_invoke(_Index_tuple<_Ind...>)
[planner_server-19]       > 244: 	  { return std::__invoke(std::get<_Ind>(std::move(_M_t))...); }
[planner_server-19]         245: 
[planner_server-19]         246: 	typename __result<_Tuple>::type
[planner_server-19]         247: 	operator()()
[planner_server-19] #13   Source "/usr/include/c++/9/bits/invoke.h", line 95, in __invoke<std::_Bind<void (nav2_costmap_2d::Costmap2DROS::*(nav2_costmap_2d::Costmap2DROS*, double))(double)> > [0x7f81a16cb062]
[planner_server-19]          92:       using __result = __invoke_result<_Callable, _Args...>;
[planner_server-19]          93:       using __type = typename __result::type;
[planner_server-19]          94:       using __tag = typename __result::__invoke_type;
[planner_server-19]       >  95:       return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
[planner_server-19]          96: 					std::forward<_Args>(__args)...);
[planner_server-19]          97:     }
[planner_server-19] #12   Source "/usr/include/c++/9/bits/invoke.h", line 60, in __invoke_impl<void, std::_Bind<void (nav2_costmap_2d::Costmap2DROS::*(nav2_costmap_2d::Costmap2DROS*, double))(double)> > [0x7f81a16cc81e]
[planner_server-19]          57:   template<typename _Res, typename _Fn, typename... _Args>
[planner_server-19]          58:     constexpr _Res
[planner_server-19]          59:     __invoke_impl(__invoke_other, _Fn&& __f, _Args&&... __args)
[planner_server-19]       >  60:     { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
[planner_server-19]          61: 
[planner_server-19]          62:   template<typename _Res, typename _MemFun, typename _Tp, typename... _Args>
[planner_server-19]          63:     constexpr _Res
[planner_server-19] #11   Source "/usr/include/c++/9/functional", line 484, in operator()<> [0x7f81a16cda4e]
[planner_server-19]         481: 	{
[planner_server-19]         482: 	  return this->__call<_Result>(
[planner_server-19]         483: 	      std::forward_as_tuple(std::forward<_Args>(__args)...),
[planner_server-19]       > 484: 	      _Bound_indexes());
[planner_server-19]         485: 	}
[planner_server-19]         486: 
[planner_server-19]         487:       // Call as const
[controller_server-18] [WARN] [1645688710.814640196] [controller_server]: Control loop missed its desired rate of 10.0000Hz
[planner_server-19] #10   Source "/usr/include/c++/9/functional", line 400, in __call<void, 0, 1> [0x7f81a16cf2e9]
[planner_server-19]         397: 	_Result
[planner_server-19]         398: 	__call(tuple<_Args...>&& __args, _Index_tuple<_Indexes...>)
[planner_server-19]         399: 	{
[planner_server-19]       > 400: 	  return std::__invoke(_M_f,
[planner_server-19]         401: 	      _Mu<_Bound_args>()(std::get<_Indexes>(_M_bound_args), __args)...
[planner_server-19]         402: 	      );
[planner_server-19]         403: 	}
[planner_server-19] #9    Source "/usr/include/c++/9/bits/invoke.h", line 95, in __invoke<void (nav2_costmap_2d::Costmap2DROS::*&)(double), nav2_costmap_2d::Costmap2DROS*&, double&> [0x7f81a16d0396]
[planner_server-19]          92:       using __result = __invoke_result<_Callable, _Args...>;
[planner_server-19]          93:       using __type = typename __result::type;
[planner_server-19]          94:       using __tag = typename __result::__invoke_type;
[planner_server-19]       >  95:       return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
[planner_server-19]          96: 					std::forward<_Args>(__args)...);
[planner_server-19]          97:     }
[planner_server-19] #8    Source "/usr/include/c++/9/bits/invoke.h", line 73, in __invoke_impl<void, void (nav2_costmap_2d::Costmap2DROS::*&)(double), nav2_costmap_2d::Costmap2DROS*&, double&> [0x7f81a16d0f73]
[planner_server-19]          70:     __invoke_impl(__invoke_memfun_deref, _MemFun&& __f, _Tp&& __t,
[planner_server-19]          71: 		  _Args&&... __args)
[planner_server-19]          72:     {
[planner_server-19]       >  73:       return ((*std::forward<_Tp>(__t)).*__f)(std::forward<_Args>(__args)...);
[planner_server-19]          74:     }
[planner_server-19]          75: 
[planner_server-19]          76:   template<typename _Res, typename _MemPtr, typename _Tp>
[planner_server-19] #7    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/costmap_2d_ros.cpp", line 412, in mapUpdateLoop [0x7f81a161ebd1]
[planner_server-19]         410:     // Measure the execution time of the updateMap method
[planner_server-19]         411:     timer.start();
[planner_server-19]       > 412:     updateMap();
[planner_server-19]         413:     timer.end();
[planner_server-19]         414: 
[planner_server-19]         415:     RCLCPP_DEBUG(get_logger(), "Map update time: %.9f", timer.elapsed_time_in_seconds());
[planner_server-19] #6    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/costmap_2d_ros.cpp", line 458, in updateMap [0x7f81a161f549]
[planner_server-19]         455:       const double & x = pose.pose.position.x;
[planner_server-19]         456:       const double & y = pose.pose.position.y;
[planner_server-19]         457:       const double yaw = tf2::getYaw(pose.pose.orientation);
[planner_server-19]       > 458:       layered_costmap_->updateMap(x, y, yaw);
[planner_server-19]         459: 
[planner_server-19]         460:       auto footprint = std::make_unique<geometry_msgs::msg::PolygonStamped>();
[planner_server-19]         461:       footprint->header.frame_id = global_frame_;
[planner_server-19] #5    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/layered_costmap.cpp", line 161, in updateMap [0x7f81a16130fb]
[planner_server-19]         158:     double prev_miny = miny_;
[planner_server-19]         159:     double prev_maxx = maxx_;
[planner_server-19]         160:     double prev_maxy = maxy_;
[planner_server-19]       > 161:     (*plugin)->updateBounds(robot_x, robot_y, robot_yaw, &minx_, &miny_, &maxx_, &maxy_);
[planner_server-19]         162:     if (minx_ > prev_minx || miny_ > prev_miny || maxx_ < prev_maxx || maxy_ < prev_maxy) {
[planner_server-19]         163:       RCLCPP_WARN(
[planner_server-19]         164:         rclcpp::get_logger(
[planner_server-19] #4    Source "/robot_ws/src/navigation2/nav2_costmap_2d/plugins/obstacle_layer.cpp", line 390, in updateBounds [0x7f819080d2e2]
[planner_server-19]         388:   // raytrace freespace
[planner_server-19]         389:   for (unsigned int i = 0; i < clearing_observations.size(); ++i) {
[planner_server-19]       > 390:     raytraceFreespace(clearing_observations[i], min_x, min_y, max_x, max_y);
[planner_server-19]         391:   }
[planner_server-19]         392: 
[planner_server-19]         393:   // place the new obstacles into a priority queue... each with a priority of zero to begin with
[planner_server-19] #3    Source "/robot_ws/src/navigation2/nav2_costmap_2d/plugins/obstacle_layer.cpp", line 636, in raytraceFreespace [0x7f819080eb6f]
[planner_server-19]         633:     unsigned int cell_raytrace_min_range = cellDistance(clearing_observation.raytrace_min_range_);
[planner_server-19]         634:     MarkCell marker(costmap_, FREE_SPACE);
[planner_server-19]         635:     // and finally... we can execute our trace to clear obstacles along that line
[planner_server-19]       > 636:     raytraceLine(marker, x0, y0, x1, y1, cell_raytrace_max_range, cell_raytrace_min_range);
[planner_server-19]         637: 
[planner_server-19]         638:     updateRaytraceBounds(
[planner_server-19]         639:       ox, oy, wx, wy, clearing_observation.raytrace_max_range_,
[planner_server-19] #2    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 466, in raytraceLine<nav2_costmap_2d::Costmap2D::MarkCell> [0x7f819081358f]
[planner_server-19]         463:       // Subtract minlength from length since initial point (x0, y0)has been adjusted by min Z
[planner_server-19]         464:       length = (unsigned int)(scale * abs_dx) - min_length;
[planner_server-19]         465: 
[planner_server-19]       > 466:       bresenham2D(
[planner_server-19]         467:         at, abs_dx, abs_dy, error_y, offset_dx, offset_dy, offset, length);
[planner_server-19]         468:       return;
[planner_server-19]         469:     }
[planner_server-19] #1    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 502, in bresenham2D<nav2_costmap_2d::Costmap2D::MarkCell> [0x7f8190817514]
[planner_server-19]         499:         error_b -= abs_da;
[planner_server-19]         500:       }
[planner_server-19]         501:     }
[planner_server-19]       > 502:     at(offset);
[planner_server-19]         503:   }
[planner_server-19]         504: 
[planner_server-19]         505: 
[planner_server-19] #0    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 534, in operator() [0x7f819080fb3e]
[planner_server-19]         531:     }
[planner_server-19]         532:     inline void operator()(unsigned int offset)
[planner_server-19]         533:     {
[planner_server-19]       > 534:       costmap_[offset] = value_;
[planner_server-19]         535:     }
[planner_server-19]         536: 
[planner_server-19]         537:   private:
[planner_server-19] Segmentation fault (Address not mapped to object [0x7f828ca16658])
[ERROR] [planner_server-19]: process has died [pid 1629, exit code -11, cmd '/robot_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args --log-level info --ros-args -r __node:=planner_server --params-file /tmp/launch_params_up3n6wv8 --params-file /tmp/tmppcmyy2f9 -r /tf:=tf -r /tf_static:=tf_static -r /tf:=tf -r /tf_static:=tf_static'].

For this issue I created a quick workaround simply by adding a if statement to lines 494 and 502. I did not yet have time to investigate this further and did not have chance to test this on rolling or with turtlebot for example.

unsigned int costmap_size = size_x_ * size_y_;
...
    if (costmap_size > offset) {
      at(offset);
    }

Please let me know if more information, configurations or something is needed.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 1, 2022

@AlexeyMerzlyakov can you take this one? Looks like another related issue that you're most recently familiar with this part of the codebase.

This does seem a little odd to be problematic, to my knowledge, nothing has changed in this part of the code from ROS 1 navigation, and I don't recall even running into this wrt Costmap2D in ROS 1. But for all I know, this could be a long-standing problem.

Can you run some experiment in the default bringup / map in order to show this? That would help if we could reproduce ourselves. You can relocalize the robot into any position on the map

@AlexeyMerzlyakov
Copy link
Collaborator

I've tried to reproduce the problem locally, but so far it's all works. @jonipol, could I ask you to share the results with default Nav2 bringup scripts? This indeed would be greatly helpful for us and allow to reproduce and understand the problem.

@jonipol
Copy link
Contributor Author

jonipol commented Mar 9, 2022

I'm sorry I have not had time to try this out yet. I will be returning from a work trip next week and will try to allocate some time to investigate this issue as it is really weird one. We just did migration from foxy to galactic, and this issue came up on the galactic branch but did not happen in Foxy one. Will update here once I manage to try this out.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 17, 2022

@jonipol can you try on main? Perhaps this is a bug that we already fixed in Rolling. @AlexeyMerzlyakov did you test with Galactic at tag 1.0.8 https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case? I know you did some work on this recently that might not have been backport-able.

@AlexeyMerzlyakov
Copy link
Collaborator

AlexeyMerzlyakov commented Mar 18, 2022

@AlexeyMerzlyakov did you test with Galactic at tag 1.0.8 https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case?

Today I've re-built nav2 stack for this tag (in a VM with Galactic installed from deb-packages) and tried to reproduce the problem moving robot to the edge of both: global and local costmaps. In case if robot inside costmap boundaries, nothing happens. In case if robot is slightly out of costmap, appropriate server (controller or planner) reports this, but still no crashes or segfaults. I've also tried to move robot to the edges of global costmap having negative X or Y coordinates. Still no effect:
Screenshot_2022-03-18_14-51-25

@jonipol, it is not finally clear what does "Move robot near the edge of costmap" mean? Does it mean robot is moving on the costmap using "2D pose esitmate" RViz command or is it physical move: immediate changing of robot coordinates in a Gazebo?

@jonipol
Copy link
Contributor Author

jonipol commented Mar 18, 2022

Sorry for unclear explanation. In this case both work, giving pose estimate and navigating the robot near edge of global costmap.

Today I managed to do some digging into this one. Tried the default config for global costmap. Corrected topic and was not able to get the crash. Did some comparisons and more testing. Found out that setting raytrace_min_range > 0.0 (0.2 in my case) makes the crash happen for me.

@AlexeyMerzlyakov
Copy link
Collaborator

@jonipol, thank you for pointing out raytrace_min_range! Confirmed the problem for TB3 standard simulation on both: Galactic and Rolling releases. I'll continue investigation/debugging on next week and give more details as they will be appeared.

@AlexeyMerzlyakov
Copy link
Collaborator

AlexeyMerzlyakov commented Mar 21, 2022

There are two versions of raytraceLine()/bresenham() algorithms. One is placed in nav2_voxel_grid/include/nav2_voxel_grid/voxel_grid.hpp and used for 3D space voxel grids. Another - in nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp which is used for 2D costmaps. The root cause of this problem is in 2D space and it is the same as was fixed in #2460 but for 3D version of Bresenham algorithm: incorrect component shift in length calculation inside raytraceLine() routine, which leads to negative shifts of costmap_ array for some cases.

So, all related bugfixes were (partially) back-ported from 3D raytraceLine() to 2D version of algorithms in #2857. Local verification shown the problem disappeared on my side. @jonipol, could you please check: does the fix proposed work on your side?

@jonipol
Copy link
Contributor Author

jonipol commented Mar 22, 2022

Yes, the proposed fix does solve the issue on hand for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants