Support `--repeat-latched` in ROS 2 (transient local) #1159

amacneil · 2022-11-11T18:31:54Z

Description

ROS 1 supported a --repeat-latched feature (ros/ros_comm#1850) to repeat latched topics at the start of each bag split.

As far as I can tell, rosbag2 does not have this feature. Is there any reason we should not add it?

The text was updated successfully, but these errors were encountered:

MichaelOrlov · 2022-11-12T01:54:18Z

@amacneil Yes we have the reason to not do this for the sake to have deterministic replay.
This is duplicate of the #1051 please see my detailed explanation in #1051 (review)

amacneil · 2022-11-13T22:33:12Z

I'll admit I am not super familiar with how tf_static and transient local works in ROS 2 (I am more familiar with latched topics in ROS 1). ros2/ros2#464 seems to have some details.

We have users who would like to open one file from a split bag and have the transforms work correctly. Your comment in that issue seems to be based on the fact that you don't have a way in rosbag2 to "play" a split bag file, but we do support opening a single file in Foxglove. I think it would be good to get more feedback from users on whether they need this feature.

MichaelOrlov · 2022-11-15T04:54:40Z

If users want something it doesn't means that we should jump in and implement what they want without considering other cases and drawbacks.

My main point not in the case that rosbag2 doesn't support playing a split bag rather that we need to have determinism in replay. Basically we shall not inject or copy messages which we didn't receive on the wires in any cases.

We have a lot another cases when transient local durability uses for instance camera images or LIDAR streams - and what? We should copy all messages from camera topic from previous file to the next file when doing split?

If someone want to repeat tf_static messages published once at the beginning before playing back bag after split, one can use rosbag2 play with duration option and topic filtering for playing back just messages from tf_static then right after playback normally bag which was recorded after split.
Another option is to use ros2 bag convert to create new bag where will be messages from tf_static from first bag and the rest will be content from bag after split.

jhurliman · 2022-11-29T08:01:25Z

We have a lot another cases when transient local durability uses for instance camera images or LIDAR streams - and what? We should copy all messages from camera topic from previous file to the next file when doing split?

My understanding of transient local durability is there is a user-configured history size, so it would only copy all messages from the previous file for a topic if the previous file was very short or the history size was unusually large. The default history size is 10, and 5 for the sensor QoS profile.

MichaelOrlov · 2022-12-01T04:27:36Z

@jhurliman Though the way how it's organized in DDS and ROS2 is that publisher has cache with N number of messages when transient local durability is used and when new subscriber joining to publisher - subscriber will be getting the last N messages from publisher cache rather than just one latest message.

The case when transient local durability uses for tf_static topic and there is a publisher only publishes a few messages at the beginning is the one particular case.

It could be another cases when transient local uses for camera images topics or LIDAR and there are continuous rgb data stream. Newly created subscriber will get N last images from cache and then all other following sequences from ongoing running stream since publisher will continuously publish new messages.

@jhurliman @amacneil After some consideration I tend to agree that we can implement this feature but not as default behavior. Only as an option with parameter given via CLI.

Some technical details and consideration how to implement this feature.

Unfortunately we can't re-read messages from publisher cache without creating a new subscription. We will have to go back to the written topics with transient local QoS settings read out all messages from them and repeat to the new bag after split. Or we can limit number of messages to repeat via CLI parameter . As for instance --repeat latched 10 which is supposed to repeat first 10 messages. Or maybe last 10 messages? Last 10 messages looks like more consistent - but probably more difficult to implement.
Please also consider performance penalty for doing this read out and save in new file during split.
Alternative way in creating a new subscriptions for the same topics to read out publisher cache would take even more time and resources due to the discovery issues.
Split operation should take very small amount of time and resources since we are locking cache from dumping messages to storage during this operation and probability to drop messages increases if split taking longer time.
If I am not very mistaken the current implementation of the rosbag2_storage_mcap plugin doesn't really support read-write interface. It's either read or write mode. Please consider this as well.

Thoughts?

ga58lar · 2023-05-30T12:47:38Z

@amacneil @MichaelOrlov Hi everyone, are there any updates on this feature?
This feature is exactly what my colleagues and I are currently looking for to record tf_static messages on each split and thus make it possible to play a single bag of a split with all information.

MichaelOrlov · 2023-06-01T02:18:58Z

@ga58lar Unlikely we will implement this feature since it contradicts the entire DDS communication approach and unavoidably will introduce a significant impact on the performance of the split operation and will cause messages drops during recording.

I would suggest a workaround.
If you need to playback only messages from the tf_static topic at the beginning of some bag file or at some random time, you can use ros2 bag convert command https://github.com/ros2/rosbag2#converting-bags to extract messages from tf_static topic to a separate bag file and use separate ros2 bag play tf_static player instance to playback those messages when you would need them.
You can use launcher yaml file to define the order of the playback calls.

emersonknapp · 2023-06-07T21:43:50Z

The way I understand this, there are these different cases being considered here:

During playback (ros2 bag play) from the beginning - we want latched topics to persist between bag splits so that late joiners receive the messages, even post split.
When starting play from a later timestamp that happens to be in a later split file - we still want those publishers to have the messages they theoretically would have
When analyzing bags as data objects (and perhaps separating splits from each other), we want to see messages in the bag representing the current latched/transient_local message buffers

I believe that Case 1 actually already works correctly. When you start playback from the beginning, publishers are created with the appropriate QoS -- so if the original topic in the recorded system had all durability=TRANSIENT_LOCAL then the rosbag2 playback publisher will do the same. Then it will have the same latched behavior to new joiners, even across splits because playback publishers are not shut down/recreated across splits.

Case 2 on the other hand would still exhibit the same behavior but only for messages originally published after the timestamp of the start of playback. To give the publishers the internal information, we could do a check for latched publishers and do a "burst publish" of their topic from the beginning of the bag up to the starting point, which would act that way.

Case 3 would prove more difficult, though I don't see it as an engineering impossibility. We might create a separate entry type, a "internal latch-buffer" topic at the beginning of new splits, that is not played back during normal operation, but used only used for the "burst publish latch priming" if starting playback from that file.

Unfortunately the History Depth (latching count) of the publisher is not always broadcast such that rosbag2 would know what it is. DDS considers History Depth (as well as Lifespan but that's less relevant here) to be an internal QoS configuration, other participants don't need to know about it, so it's not guaranteed in every implementation to be conveyed in discovery.

Like @MichaelOrlov did bring up, though, the potential extra features I mentioned are not without performance cost

tonynajjar · 2024-04-03T09:39:13Z

Hey, I just encountered this issue when trying to visualize split bag files in Foxglove. @amacneil or @MichaelOrlov I understand that this is WIP but did you find a workaround I could use at the moment, either on the foxglove or rosbag2 side? The only option for me is to publish /tf_static more frequently which beats its purpose, or transfer my static TFs to /tf and publish at high rate which is a waste of resources. Thanks

Maybe @ga58lar you have an idea as well?

P.S: @MichaelOrlov your proposed workaround would not work in my use case unfortunately

ga58lar · 2024-04-03T12:20:43Z

@tonynajjar Unfortunately, I have not found a really good solution that does not defeat the criticism raised.
I am now publishing the static transforms on every bag-split event of the recording. I think the ease of use for splitted bags is worth the overhead.
In my case, all static transforms are located within one URDF file and I am using the robot_state_publisher.
Therefore, I just added one subscription to events/write_split and execute publishFixedTransforms(); in the callback.

https://github.com/ros/robot_state_publisher/blob/ros2/src/robot_state_publisher.cpp#L151

  // subscribe to bag split events
  bag_split_sub_ = this->create_subscription<rosbag2_interfaces::msg::WriteSplitEvent>(
    "events/write_split",
    rclcpp::SensorDataQoS(),
    std::bind(&RobotStatePublisher::callbackBagSplit, this, std::placeholders::_1),
    subscriber_options);

https://github.com/ros/robot_state_publisher/blob/ros2/src/robot_state_publisher.cpp#L364

void RobotStatePublisher::callbackBagSplit([[maybe_unused]] rosbag2_interfaces::msg::WriteSplitEvent::SharedPtr message)
{
  // re-publish tf_static from urdf file on every bag split
  publishFixedTransforms();
  return;
}

It would be better though to subscribe to /tf_static too and re-publish the topic instead of loading the static transforms on each split

tonynajjar · 2024-04-03T13:49:19Z

Thanks for sharing @ga58lar, it does look like an okay workaround when all your static transforms come from the robot_state_publisher; unfortunately it's not the case for me, they come from at least 3 different sources. I find it quite dirty to enable the regular republishing of /tf_static from all the sources :/

tonynajjar · 2024-04-10T10:52:32Z

For anyone wanting to implement this in their own fork in the future I implemented it on humble: angsa-robotics#1 (inspired from #1051).

I don't like maintaining a long-term fork but I ran out of options. @MichaelOrlov I understand why you don't want this to be the default behavior but I think having the option to enable it is not so bad

Aposhian · 2024-06-20T21:05:31Z

As @tonynajjar mentioned, this feature is very useful for doing analysis of single split files, either with static analysis (e.g. Foxglove) or even with playback. By embedding transient local topics in each split, then playing back each split appears as if the recorder was just a late subscriber at the time of the split start. So while yes, this feature is not exactly correct, it is incredibly useful, and I think it makes sense to have as an opt-in feature.

MichaelOrlov · 2024-06-21T18:47:09Z

@Aposhian @tonynajjar OK, it looks like it draws a picture for the feature design proposal.

Will need to add a new --repeat-latched CLI option to the recorder.
--repeat-latched CLI option will accept a queue depth, the number of messages to memorize on each topic. The default value is 0. i.e. the feature disabled.
The uint32_t repeat_latched shall be added to the rosbag2_transport::RecordOptions data structure. The default value equal to 0 will mean that the --repeat-latched CLI option is not selected.
Repeat-latched feature will memorize in RAM only the first N messages from all topics with QoS durability settings = RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL.
Repeat-latched feature will save previously memorized messages at the beginning of each new bag file after the split.
Repeat-latched feature will modify sent and received timestamps for memorized messages to be current before dumping them into the bag file. i.e., it shall be the same timestamps as from the latest received message.
The same rosbag2_transport::RecordOptions.repeat_latched parameter shall be supported for the ros2 bag rewrite CLI.

Aposhian · 2024-08-08T21:30:08Z

@MichaelOrlov do you have any issues with @tonynajjar's implementation? https://github.com/angsa-robotics/rosbag2/pull/1/files

tonynajjar · 2024-08-09T06:44:52Z

@Aposhian I suppose the objection is the same as mentioned here since it's the same implementation

MichaelOrlov · 2024-08-15T23:31:59Z

@tonynajjar @Aposhian We just need to rewrite it to adhere to the plan proposal in the #1159 (comment)

Rayman · 2025-01-24T08:28:43Z

One issue I have with this proposal is that the queue depth could be different per topic. As an example:

/map is a large data structure, you want only the last one
/tf_static you want the last published TFMessage per publisher on a topic. Suppose there are five nodes which publish a TFMessage, then a late-joiner wants to receive 5 TFMessage.

MichaelOrlov · 2025-01-24T18:10:49Z

@emersonknapp You mentioned in today's tooling WG meeting that it was a way to know about the publisher queue size but not all DDS implementations support it.
Could you please point out how to get the publisher queue size on the subscriber side? I would like to consider this option.

emersonknapp · 2025-01-24T22:02:17Z

@MichaelOrlov the History Depth is the number of messages that get persisted in the queue and re-sent to new participants when using Transient Local Durability. In at least one of the dds implementations, this History Depth is filled correctly when we discover the reported qos in get endpoint info by topic https://github.com/ros2/rosbag2/blob/rolling/rosbag2_transport%2Fsrc%2Frosbag2_transport%2Frecorder.cpp#L697

In the other one, the History Depth number is always filled with the default value (I think that's 10). I can't remember right now which one of cyclone vs fastdds does which behavior.

If I remember correctly, the dds spec doesn't dictate this behavior because History Depth does not affect matching compatibility, so it's not required to be reported. I could be wrong about that, in which case this is a bug in the one implementation.

MichaelOrlov · 2025-01-27T22:46:23Z

I traced it down that all 3 DDS rmw layers (CycloneDDS, FastDDS, and ConnextDDS) use graph cache common_context->graph_cache.get_writers_info_by_topic(..) to get those info.
Now the question is how those information gets to the graph cache from different DDS implementations.

MichaelOrlov · 2025-01-28T01:07:34Z

Okay. I've tried to research topic about publisher history depth and how it is getting to the graph_cach.
Essentially QOS gets to the graph_cache via the dds_qos_to_rmw_qos(const DDSQoSPolicyT & dds_qos, rmw_qos_profile_t * qos) conversion function, which is specialized for each DDS RMW implementation.
The only limitation I found is that history depth is invalid when using a history kind equal to the KEEP_ALL, and in CycloneDDS, in particular, it is settled up to the -1. However, on ROS 2 CycloneDDS RMW layer it is mapped to 0 in case of the KEEP_ALL history kind.
As regards the FasDDS, the history depth is preserved in any case. However, it is supposed to have an invalid value in the case of the KEEP_ALL history kind.

In all other cases, the history dept is expected to have a valid value! If not, it should be considered a bug.

For --repeat-latched feature consideration I would propose to limit queue depth by a minimum value in between publisher->qos.depth() which we can get from the node->get_publishers_info_by_topic(topic_name) call and N repeat latched messages specified in CLI option.

As a follow-up, will need to write a simple unit test with a publisher and subscriber node and a call to the node->get_publishers_info_by_topic(topic_name) to see in what cases and with what DDS implementations it doesn't work as expected.
Then, if faulty behavior is found and it is not with the KEEP_ALL history kind, - we need to create a BUG report and a new issue for a relevant DDS implementation or RMW layer.

MichaelOrlov · 2025-01-28T21:19:11Z

cc: @mjcarroll This feature, --repeat-latched N for the Rosbag2 recorder, could be a good candidate for the summer code exercise.
We more or less agreed on the design of this feature. See details in the #1159 (comment) and my latest message above #1159 (comment).

However, the problem is that almost everybody wants this feature ASAP.
I think If nobody will take it and complete it before summer - it could be a candidate.

amacneil added the enhancement New feature or request label Nov 11, 2022

amacneil mentioned this issue Nov 13, 2022

Transient-local topics are not repeated into individual MCAP files when file splitting is enabled ros-tooling/rosbag2_storage_mcap#75

Closed

emersonknapp mentioned this issue Jan 21, 2023

Transient Local topics should optionally appear in all bag splits. #1234

Closed

emersonknapp changed the title ~~Support --repeat-latched in ROS 2~~ Support --repeat-latched in ROS 2 (transient local) Jan 21, 2023

MichaelOrlov mentioned this issue Jul 17, 2024

Topics with TRANSIENT_LOCAL QoS are not replayed with --start-offset #1482

Closed

MichaelOrlov mentioned this issue Jan 10, 2025

Handle transient local messages in snapshot mode when the buffer is full #1887

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `--repeat-latched` in ROS 2 (transient local) #1159

Support `--repeat-latched` in ROS 2 (transient local) #1159

amacneil commented Nov 11, 2022 •

edited

Loading

MichaelOrlov commented Nov 12, 2022

amacneil commented Nov 13, 2022

MichaelOrlov commented Nov 15, 2022

jhurliman commented Nov 29, 2022

MichaelOrlov commented Dec 1, 2022

ga58lar commented May 30, 2023

MichaelOrlov commented Jun 1, 2023

emersonknapp commented Jun 7, 2023 •

edited

Loading

tonynajjar commented Apr 3, 2024 •

edited

Loading

ga58lar commented Apr 3, 2024

tonynajjar commented Apr 3, 2024 •

edited

Loading

tonynajjar commented Apr 10, 2024 •

edited

Loading

Aposhian commented Jun 20, 2024

MichaelOrlov commented Jun 21, 2024

Aposhian commented Aug 8, 2024

tonynajjar commented Aug 9, 2024

MichaelOrlov commented Aug 15, 2024

Rayman commented Jan 24, 2025

MichaelOrlov commented Jan 24, 2025 •

edited

Loading

emersonknapp commented Jan 24, 2025 •

edited

Loading

MichaelOrlov commented Jan 27, 2025

MichaelOrlov commented Jan 28, 2025 •

edited

Loading

MichaelOrlov commented Jan 28, 2025

Support --repeat-latched in ROS 2 (transient local) #1159

Support --repeat-latched in ROS 2 (transient local) #1159

Comments

amacneil commented Nov 11, 2022 • edited Loading

Description

MichaelOrlov commented Nov 12, 2022

amacneil commented Nov 13, 2022

MichaelOrlov commented Nov 15, 2022

jhurliman commented Nov 29, 2022

MichaelOrlov commented Dec 1, 2022

ga58lar commented May 30, 2023

MichaelOrlov commented Jun 1, 2023

emersonknapp commented Jun 7, 2023 • edited Loading

tonynajjar commented Apr 3, 2024 • edited Loading

ga58lar commented Apr 3, 2024

tonynajjar commented Apr 3, 2024 • edited Loading

tonynajjar commented Apr 10, 2024 • edited Loading

Aposhian commented Jun 20, 2024

MichaelOrlov commented Jun 21, 2024

Aposhian commented Aug 8, 2024

tonynajjar commented Aug 9, 2024

MichaelOrlov commented Aug 15, 2024

Rayman commented Jan 24, 2025

MichaelOrlov commented Jan 24, 2025 • edited Loading

emersonknapp commented Jan 24, 2025 • edited Loading

MichaelOrlov commented Jan 27, 2025

MichaelOrlov commented Jan 28, 2025 • edited Loading

MichaelOrlov commented Jan 28, 2025

Support `--repeat-latched` in ROS 2 (transient local) #1159

Support `--repeat-latched` in ROS 2 (transient local) #1159

amacneil commented Nov 11, 2022 •

edited

Loading

emersonknapp commented Jun 7, 2023 •

edited

Loading

tonynajjar commented Apr 3, 2024 •

edited

Loading

tonynajjar commented Apr 3, 2024 •

edited

Loading

tonynajjar commented Apr 10, 2024 •

edited

Loading

MichaelOrlov commented Jan 24, 2025 •

edited

Loading

emersonknapp commented Jan 24, 2025 •

edited

Loading

MichaelOrlov commented Jan 28, 2025 •

edited

Loading