Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault occurs with Type Adapter when using large data size in ROS message #2417

Closed
allbluelai opened this issue Jan 28, 2024 · 7 comments · Fixed by #2443
Closed
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@allbluelai
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • source install
  • DDS implementation:
    • test on Fast-RTPS, CycloneDDS, iceoryx DDS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

TestMsg.msg:

# uint32[1048576] data # work fine
uint32[2097152] data # fail
uint8[128] name
uint8 name_size

myString.hpp:

#ifndef MY_STRING_HPP
#define MY_STRING_HPP

#include "rclcpp/rclcpp.hpp"
#include "tutorial_interfaces/msg/test_msg.hpp"

template <>
struct rclcpp::TypeAdapter<
    std::string,
    tutorial_interfaces::msg::TestMsg>
{
  using is_specialized = std::true_type;
  using custom_type = std::string;
  using ros_message_type = tutorial_interfaces::msg::TestMsg;

  static void
  convert_to_ros_message(
      const custom_type &source,
      ros_message_type &destination)
  {
    std::memcpy(destination.name.data(), source.data(), source.size());
    destination.name_size = source.size();
  }

  static void
  convert_to_custom(
      const ros_message_type &source,
      custom_type &destination)
  {
    destination.resize(source.name_size);
    std::memcpy(destination.data(), source.name.data(), source.name_size);
  }
};

RCLCPP_USING_CUSTOM_TYPE_AS_ROS_MESSAGE_TYPE(std::string, tutorial_interfaces::msg::TestMsg);

#endif // !MY_STRING_HPP

Here, std::string is a custom type.

pub.cpp:

#include <chrono>
#include <memory>
#include <string>
#include "rclcpp/rclcpp.hpp"
#include "myString.hpp"

using namespace std::chrono_literals;
using TopicPub = std::string;

class MinimalPublisher : public rclcpp::Node
{
public:
  MinimalPublisher()
      : Node("minimal_publisher"), count_(0)
  {
    auto timer_callback =
        [this]() -> void
    {
      std::string name = "myName";
      this->publisher_->publish(name);
    };

    publisher_ = this->create_publisher<TopicPub>("topic", 10);
    timer_ = create_wall_timer(30ms, timer_callback);
  }

private:
  rclcpp::TimerBase::SharedPtr timer_;
  rclcpp::Publisher<TopicPub>::SharedPtr publisher_;
  size_t count_;
};

int main(int argc, char *argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<MinimalPublisher>());
  rclcpp::shutdown();
  return 0;
}

sub.cpp:

#include <memory>
#include "rclcpp/rclcpp.hpp"
#include <chrono>
#include "myString.hpp"

using TopicSub = std::string;

class MinimalSubscriber : public rclcpp::Node
{
public:
  MinimalSubscriber()
      : Node("minimal_subscriber")
  {
    auto callback = [this](const TopicSub &msg)
    {
      std::cout << "msg = " << msg << std::endl;
    };

    subscription_ = this->create_subscription<TopicSub>("topic", 10, callback);
  }

private:
  rclcpp::Subscription<TopicSub>::SharedPtr subscription_;
};

int main(int argc, char *argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<MinimalSubscriber>());
  rclcpp::shutdown();
  return 0;
}

Expected behavior

Message sent successfully in the publisher and received in the subscriber.

Actual behavior

[ros2run]: Segmentation fault in the publisher.

Additional information

In TestMsg.msg, I define two different data size messages (1MB and 2MB). It works well with 1MB but fails with 2MB.

@Zard-C
Copy link
Contributor

Zard-C commented Jan 28, 2024

Hi, imo

  1. the version of ROS2(rolling? iron?)
  2. if possible, use gdb to located where it crashed.

@allbluelai
Copy link
Author

Hello, @Zard-C

Hi, imo

  1. the version of ROS2(rolling? iron?)

Apologies for not mentioning the version. I run the code in ROS2 Humble.

  1. if possible, use gdb to located where it crashed.

It crashes when it calls publish().
some gdb info:

$ ros2 run --prefix 'gdb -ex run --args' topics pub
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/john/Documents/test_share/install/topics/lib/topics/pub...
Starting program: /home/john/Documents/test_share/install/topics/lib/topics/pub 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff57ff640 (LWP 8092)]
[New Thread 0x7ffff4ffe640 (LWP 8093)]
[New Thread 0x7ffff47fd640 (LWP 8094)]
[New Thread 0x7ffff3ffc640 (LWP 8095)]
[New Thread 0x7ffff37fb640 (LWP 8096)]
[New Thread 0x7ffff2ffa640 (LWP 8097)]
[New Thread 0x7ffff27f9640 (LWP 8098)]
[New Thread 0x7ffff1f38640 (LWP 8099)]
[New Thread 0x7ffff1737640 (LWP 8100)]

Thread 1 "pub" received signal SIGSEGV, Segmentation fault.
0x00005555555b8908 in rclcpp::Publisher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<void> >::publish<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (this=<error reading variable: Cannot access memory at address 0x7fffff7f13b8>, msg=<error reading variable: Cannot access memory at address 0x7fffff7f13b0>) at /home/john/ros2_humble/install/rclcpp/include/rclcpp/rclcpp/publisher.hpp:364
364	  publish(const T & msg)
(gdb) bt
#0  0x00005555555b8908 in rclcpp::Publisher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<void> >::publish<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >
    (
    this=<error reading variable: Cannot access memory at address 0x7fffff7f13b8>, 
    msg=<error reading variable: Cannot access memory at address 0x7fffff7f13b0>)
    at /home/john/ros2_humble/install/rclcpp/include/rclcpp/rclcpp/publisher.hpp:364
#1  0x00005555555b5f77 in MinimalPublisher::MinimalPublisher()::{lambda()#1}::operator()() const (__closure=0x5555558ece80)
    at /home/john/Documents/test_share/src/topics/src/pub.cpp:20
#2  0x00005555555db20a in rclcpp::GenericTimer<MinimalPublisher::MinimalPublisher()::{lambda()#1}, (void*)0>::execute_callback_delegate<MinimalPublisher::MinimalPublisher()::{lambda()#1}, (void*)0>() (this=0x5555558ece50)
    at /home/john/ros2_humble/install/rclcpp/include/rclcpp/rclcpp/timer.hpp:244
#3  0x00005555555da5bd in rclcpp::GenericTimer<MinimalPublisher::MinimalPublisher()::{lambda()#1}, (void*)0>::execute_callback() (this=0x5555558ece50)
    at /home/john/ros2_humble/install/rclcpp/include/rclcpp/rclcpp/timer.hpp:230
#4  0x00007ffff7be33e7 in rclcpp::Executor::execute_timer(std::shared_ptr<rclcpp::TimerBase>) () from /home/john/ros2_humble/install/rclcpp/lib/librclcpp.so
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007ffff7be1e19 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /home/john/ros2_humble/install/rclcpp/lib/librclcpp.so
#6  0x00007ffff7bf2e81 in rclcpp::executors::SingleThreadedExecutor::spin() ()
   from /home/john/ros2_humble/install/rclcpp/lib/librclcpp.so
#7  0x00007ffff7bf0355 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) ()
   from /home/john/ros2_humble/install/rclcpp/lib/librclcpp.so
#8  0x00007ffff7bf0482 in rclcpp::spin(std::shared_ptr<rclcpp::Node>) ()
   from /home/john/ros2_humble/install/rclcpp/lib/librclcpp.so
#9  0x00005555555b2748 in main (argc=1, argv=0x7fffffff1ba8)
    at /home/john/Documents/test_share/src/topics/src/pub.cpp:36

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Jan 30, 2024
@fujitatomoya
Copy link
Collaborator

this can be reproducible with mainline rolling branch. as reported, with 1MB data, this works okay but 2MB.

[Current thread is 1 (Thread 0x7f4736c2e680 (LWP 423640))]
(gdb) bt
#0  0x000055b12dd539b8 in rclcpp::Publisher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<void> >::publish<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (
    this=<error reading variable: Cannot access memory at address 0x7ffddbdcbbc8>, 
    msg=<error reading variable: Cannot access memory at address 0x7ffddbdcbbc0>)
    at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/rclcpp/publisher.hpp:340
#1  0x000055b12dd512f9 in MinimalPublisher::MinimalPublisher()::{lambda()#1}::operator()() const (__closure=0x55b12f9b9348)
    at /root/ros2_ws/colcon_ws/src/ros2/ros2_test_prover/prover_rclcpp/src/rclcpp_2417_pub.cpp:20
...

(gdb) frame 1
#1  0x000055b12dd512f9 in MinimalPublisher::MinimalPublisher()::{lambda()#1}::operator()() const (__closure=0x55b12f9b9348)
    at /root/ros2_ws/colcon_ws/src/ros2/ros2_test_prover/prover_rclcpp/src/rclcpp_2417_pub.cpp:20
20	      this->publisher_->publish(name);
(gdb) print name
$1 = "myName"
(gdb) whatis this
type = MinimalPublisher * const
(gdb) whatis publisher_
type = rclcpp::Publisher<std::string, std::allocator<void> >::SharedPtr

it seems to be okay before calling rclcpp::Publisher, right after the function call, memory is corrupted?

@fujitatomoya fujitatomoya added the bug Something isn't working label Jan 30, 2024
@roncapat
Copy link
Contributor

roncapat commented Feb 6, 2024

It may be curious to understand where the message gets created by rclcpp during conversion (stack or heap) and if the issue happens on different machines or architectures at the same size boundary. Another thing is: has anyone tried to do a dycotomic research between 1Mb & 2Mb to find the exact spot things start to break?

@thomasmoore-torc
Copy link

This is likely being caused because the ROSMessageType is being allocated on the stack in the Publisher::publish() methods:

ROSMessageType ros_msg;
rclcpp::TypeAdapter<MessageT>::convert_to_ros_message(*msg, ros_msg);
return this->do_inter_process_publish(ros_msg);

ROSMessageType ros_msg;
rclcpp::TypeAdapter<MessageT>::convert_to_ros_message(msg, ros_msg);
// In this case we're not using intra process.
return this->do_inter_process_publish(ros_msg);

The safe thing would be to allocate the message on the heap:

auto ros_msg = create_ros_message_unique_ptr();
rclcpp::TypeAdapter<MessageT>::convert_to_ros_message(*msg, *ros_msg); 
return this->do_inter_process_publish(*ros_msg); 

However, it would probably be good to have a similar TODO as in AnySubscriptionCallback::dispatch() to allow small messages to be allocated on the stack:

// TODO(wjwwood): consider avoiding heap allocation for small messages
// maybe something like:
// if constexpr (rosidl_generator_traits::has_fixed_size<T> && sizeof(T) < N) {
// ... on stack
// }

Even better would be for the option to create on the stack or heap to be configurable.

@roncapat
Copy link
Contributor

roncapat commented Feb 7, 2024

Spotted then ;) It is something that needs to be avoided IMHO.

@audrow
Copy link
Member

audrow commented Feb 15, 2024

Perhaps this could be fixed by using a shared or unique pointer, like is done in other locations?

auto ros_msg_ptr = std::make_shared<ROSMessageType>();
// TODO(clalancette): This is unnecessarily doing an additional conversion
// that may have already been done in do_intra_process_publish_and_return_shared().
// We should just reuse that effort.
rclcpp::TypeAdapter<MessageT>::convert_to_ros_message(*msg, *ros_msg_ptr);
this->do_intra_process_publish(std::move(msg));
this->do_inter_process_publish(*ros_msg_ptr);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants