-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support recording and replay service #1414
[WIP] Support recording and replay service #1414
Conversation
1cb3812
to
bd6e02b
Compare
Current status @MichaelOrlov
|
I have a question on filter for mcap implementation. rosbag2/rosbag2_storage_mcap/src/mcap_storage.cpp Lines 524 to 543 in 744cdf0
If But sqlite implementation has the different behavior. rosbag2/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp Lines 524 to 548 in 744cdf0
For this PR, new filter parameters for play are added. I have to modify filter related codes. |
@Barry-Xu-2018 There are a bit of discussion about topic filters in this PR #944. It is a bit stale but some info for the context and probably other questions in this area. I need to reaiterate on it and move forward with it. As reagrds to the MCAP behavior. cc: @emersonknapp |
Okay. Thank you. |
f709616
to
ba418e9
Compare
I found a problem based on current contents of service event info. Service and client all enable introspection
These 2 client gids are different. So we have no way to identify these 2 events for one request. I have 2 solutions.
I'd like to hear your opinions. |
@Barry-Xu-2018 I am sorry I will be on vacation for the next two weeks and will not be able to take a look at your findings. @fujitatomoya @emersonknapp Could you please take a look on the issue found by @Barry-Xu-2018 in the previous post? |
Friendly ping @emersonknapp @fujitatomoya, do you have any idea about this issue #1414 (comment) ? |
@Barry-Xu-2018 sorry to be late to get back to you.
before how to fix, how come these gids are different? i think that gid (event_type: 0) is the client gid via seems like, this is against the design. if the gid is not unique through the transaction, we cannot tell service request and response match. |
Completely agree. The current implementation cannot confirm whether the sent request record and the received request record are pointing to the same request. This doesn't follow the design. After checking code for rmw_fastrtps, I find service take request request.sample_identity_ = info_seq[0].sample_identity; // Get writer_guid
// Use response subscriber guid (on related_sample_identity) when present.
const eprosima::fastrtps::rtps::GUID_t & reader_guid =
info_seq[0].related_sample_identity.writer_guid();
if (reader_guid != eprosima::fastrtps::rtps::GUID_t::unknown()) {
request.sample_identity_.writer_guid() = reader_guid; // writer_guid will be replace by reader guid.
} Where do I report this bug ? Each rmw ? |
I think that is what needs to be
i think that this is just unexpected behavior because request_header is overwritten and reused from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments based on my first review.
std::unordered_map<std::string, std::shared_ptr<PlayerPublisher>> publishers_; | ||
using SharedPlayerPublisher = std::shared_ptr<PlayerPublisher>; | ||
using SharedPlayerClient = std::shared_ptr<PlayerClient>; | ||
std::unordered_map< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the name of topic and service are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is possible. I will consider how to handle this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking implementation, for service, service event topic name is used as key.
So if topic name is the same as service event topic name, there is a problem. Not sure whether do we need to deal with this case?
Codes are ready to be reviewed. Please help to review them. This PR depended on below PR:
One issue
|
@Barry-Xu-2018 Could you please rebase your branch on top of the latest |
67678ff
to
93768b5
Compare
Done. I fixed the conflicts. @MichaelOrlov |
@Barry-Xu-2018 Sorry, it seems will need to rebase and fix conflicts one more time. |
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
93768b5
to
2cea90d
Compare
It's ok. |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-09-21/33733/1 |
@Barry-Xu-2018 I see that the current implementation depends on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Barry-Xu-2018 This PR has grown and become very large with a lot of changes on the multiple layers.
First of all, I would like to say thank you for your thorough and tremendous work!
While it is thorough and addresses all aspects of this big feature – it has become difficult to manage/review it and move forward with it.
I would suggest to split it on two parts. One part could be bigger with everything among playback of the service requests. At least move changes in the rosbag2_transport::Player
class to the separate follow-up PR. It turns out that playback of the service request is not trivial to implement.
I have made a first pass of the review and here is my major notes and proposals:
It seems we don’t have tests for the rosbag2_cpp::Info::read_service_info(...)
and Python wrapper for information about services. It would be nice to add at least one sanity check.
I have a proposal about new parameters for topic/services exclusions:
- Keep the old parameter
exclude
but rename it to theexclude_regex
. it will be applicable to both topics and services. - Make exclude_topics and exclude_services as
std::vector<std::string>
i.e. space separated list of topics and services names for CLI arguments.
IMO it will be more user-friendly. We have had recently an incoming issue from one of the users when it was expected that the exclude parameter is a list of topics. Like opposite to the topics parameter.
It looks intuitively reasonable to expect it to be defined this way.
I would like to request to rename PlayerClient
to the PlayerServiceClient
and consider moving it to its own separate cpp and hpp files.
The PlayerClient
is pretty large and independent from its parent PlayerImpl
class. We already have a lot of code inside the Player class and with this inner class inside it will become more cluttered and more difficult to navigate and support.
We are doing deserialization for the same service message twice. Once in the PlayerClient::is_include_request_message(..)
and the second time in the PlayerClient::async_send_request()
. This is an expensive procedure.
I would like to request structural changes there.
Could you please rework is_include_request_message(..)
to the get_service_request_message(serialized_message)
and return smart pointer to the deserialized message? If there are no valid request message it will return a pointer to null. Then rework async_send_request(deserialized_message)
to accept those smart pointer to the already deserialized message.
I also have a concern about PlayerClient::async_send_request
. We can’t really make it sending request asynchronously at least with the current design. Because we need to wait for future returning from client_->async_send_request(..)
. Since we can't delete ros_message
until the future complete. Otherwise, it will be a dangling pointer to the request_addr
and memory could be reused and we have a chance to use corrupted memory during the client_->async_send_request(..)
. Also according to the notes in the client_->async_send_request(..)
API need to call client_->remove_pending_request(future)
if we wait for future and timeout happened.
As a first iteration, I would suggest to making sending client request synchronous with timeout. Then need to figure out how to design it to be asynchronous.
I have idea to create some static worker thread and wait in it for static condition variable by timeout and check for future. If another request needs to be sent we can signal to condition variable to interrupt the wait earlier than timeout, remove_pending_request(future) and then try to send the new service request.
if args.exclude_services: | ||
play_options.services_regex_to_exclude = args.exclude_services + '/_service_event' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Barry-Xu-2018 As far as understand the args.exclude_services
is supposed to be a list of services regex separated by spaces.
Shouldn't we add space before + '/_service_event'
as well?
# One options out of --all, --all-topics, --all-services, --services, topics or --regex | ||
# must be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# One options out of --all, --all-topics, --all-services, --services, topics or --regex | |
# must be used | |
# At least one options out of --all, --all-topics, --all-services, --services, topics or --regex | |
# must be used |
if not self._check_necessary_argument(args): | ||
return print_error('Must specify only one option out of --all, --all-topics, ' | ||
'--all-services, --services, topics and --regex') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self._check_necessary_argument(args): | |
return print_error('Must specify only one option out of --all, --all-topics, ' | |
'--all-services, --services, topics and --regex') | |
if not self._check_necessary_argument(args): | |
return print_error('Neet to specify one option out of --all, --all-topics, ' | |
'--all-services, --services, topics and --regex') |
Format format = definition_identifier.format() == Format::SRV ? | ||
Format::MSG : definition_identifier.format(); | ||
DefinitionIdentifier dep(dep_name, format); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Barry-Xu-2018 Could you please clarify why do we need to remap Format::SRV
to the Format::MSG
here?
It would be useful to have a comment in the source code.
#include "rcl/service_introspection.h" | ||
#include "rmw/rmw.h" | ||
|
||
#include "rcpputils/filesystem_helper.hpp" | ||
#include "rosbag2_cpp/service_utils.hpp" | ||
#include "rosbag2_storage/logging.hpp" | ||
#include "rosbag2_storage/metadata_io.hpp" | ||
#include "rosbag2_storage/storage_interfaces/read_only_interface.hpp" | ||
#include "rosbag2_storage/storage_factory.hpp" | ||
|
||
#include "service_msgs/msg/service_event_info.hpp" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "rcl/service_introspection.h" | |
#include "rmw/rmw.h" | |
#include "rcpputils/filesystem_helper.hpp" | |
#include "rosbag2_cpp/service_utils.hpp" | |
#include "rosbag2_storage/logging.hpp" | |
#include "rosbag2_storage/metadata_io.hpp" | |
#include "rosbag2_storage/storage_interfaces/read_only_interface.hpp" | |
#include "rosbag2_storage/storage_factory.hpp" | |
#include "service_msgs/msg/service_event_info.hpp" | |
include "rmw/rmw.h" | |
#include "rosidl_typesupport_cpp/message_type_support.hpp" | |
#include "rcpputils/filesystem_helper.hpp" | |
#include "service_msgs/msg/service_event_info.hpp" | |
#include "rosbag2_cpp/service_utils.hpp" | |
#include "rosbag2_storage/metadata_io.hpp" | |
#include "rosbag2_storage/storage_factory.hpp" |
Better organize includes. Adds missing and remove unused.
|
||
void PlayerImpl::PlayerClient::async_send_request(const rclcpp::SerializedMessage & message) | ||
{ | ||
void * ros_message = new uint8_t[message_members_->size_of_]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be safer to use smart pointer rather than raw pointer to avoid memory leaks and better manage the liveliness of the data.
message_members_->init_function( | ||
ros_message, rosidl_runtime_cpp::MessageInitialization::ZERO); | ||
|
||
int ret = rmw_deserialize(&message.get_rcl_serialized_message(), ts_, ros_message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing deserialization for the same message twice. Once in the is_include_request_message(..)
and second time here. This is an expensive procedure.
I would like to request structural changes.
Could you please rework is_include_request_message(..)
to the get_request_message
and return smart pointer to the deserialized message.
Then rework async_send_request()
to accept those smart pointer to the already deserialized message.
void PlayerImpl::PlayerClient::async_send_request(const rclcpp::SerializedMessage & message) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern about PlayerClient::async_send_request
. We can’t really make it sending request asynchronously at least with the current design. Because we need to wait for future returning from client_->async_send_request(..)
. Since we can't delete ros_message
until the future complete. Otherwise, it will be a dangling pointer to the request_addr
and memory could be reused and we have a chance to use corrupted memory during the client_->async_send_request(..)
. Also according to the notes in the client_→async_send_request(..)
API need to call client_→remove_pending_request(future)
if we waited for future and timeout happened.
get_message_type_support_handle<service_msgs::msg::ServiceEventInfo>(); | ||
if (type_support_info == nullptr) { | ||
throw std::runtime_error( | ||
"It failed to get message type support handle of service event info !"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It failed to get message type support handle of service event info !"); | |
"Failed to get message type support handle of service event info !"); |
Maybe also add output to the log?
type_support_info, | ||
reinterpret_cast<void *>(&msg)); | ||
if (ret != RMW_RET_OK) { | ||
throw std::runtime_error("It failed to deserialize message !"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw std::runtime_error("It failed to deserialize message !"); | |
throw std::runtime_error("Failed to deserialize message !"); |
I'm sorry for the late reply. I just finished a long holiday.
No. This PR is ready. But this PR depended on ros2/rclcpp#2209 which is under review. So I want to submit PR after 2209 is merged. |
Thank you for your review. |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-10-12/34178/1 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-11-16/34757/1 |
@Barry-Xu-2018 do we want to close this PR, instead we need post the review comments either on #1480 or #1481 along with the implementation? |
Related issue: ros2/ros2#1285
The design document https://github.com/ros2/rosbag2/blob/rolling/docs/design/rosbag2_record_replay_service.md