Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle 'best_available' QoS policies #598

Merged
merged 2 commits into from
May 3, 2022
Merged

Conversation

jacobperron
Copy link
Member

Connects to ros2/rmw#320

If users set certain QoS policies to BEST_AVAILABLE, then we query the graph for endpoint info.
The policy is updated such that it is compatible with all endpoints while maintaining the
highest possible level of service.

If users set certain QoS policies to BEST_AVAILABLE, then we query the graph for endpoint info.
The policy is updated such that it is compatible with all endpoints while maintaining the
highest possible level of service.

Signed-off-by: Jacob Perron <[email protected]>
@j-rivero
Copy link
Contributor

I see a couple of functions rmw_create_client and rmw_create_service that also deal with qos_profiles. Do we need to implement the logic for them too?

@jacobperron
Copy link
Member Author

Good point about services and clients. I'm not sure if we should support matching for them or not. At the very least we should define what happens if we provide 'best effort' policies.

We have explicit policies for service and clients, and I think it is rare, and perhaps not recommended, to create services and clients that are not using these policies. So, I'm not sure if we should try handling 'best effort' policies for services. I need to think about it a little more.

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

About what to do with services, I'm not sure.
Maybe trying to do the same we do with publishers, using a default or failing sound all okay to me.

@j-rivero
Copy link
Contributor

A final though about design: since we are adding almost the same code in every rmw_ implementation, something like:

  rmw_qos_profile_t adapted_qos_policies = *qos_policies;
  rmw_ret_t ret = rmw_dds_common::qos_profile_get_best_available_for_topic_publisher(
    node, topic_name, &adapted_qos_policies, rmw_get_subscriptions_info_by_topic);
  if (RMW_RET_OK != ret) {
    return nullptr;
  }

Would it make sense to move it to a different layer of the stack (rmw_common? or rmw) or does it fit better into this one of rmw implementations?

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I think the boilerplate that is there is not likely to change, it's essentially just calling the common function. So I wouldn't try to de-duplicate it I think, because if we did it would probably involve a macro and I don't think that's a a good idea personally. But maybe a c++ function that returns the adapted qos policy might shorten it slightly.

@jacobperron
Copy link
Member Author

I'm not sure how much more of the boilerplate we could move into a common place (e.g. rmw_dds_common). Like William points out, we could try to change the signature of rmw_dds_common::qos_profile_get_best_available_for_topic_publisher to slim it down a bit, maybe using a std::optional or std::tuple to communicate an error? I don't think it would change the call site much though, e.g.

  std::optional<rmw_qos_profile_t adapted_qos_policies =
    rmw_dds_common::qos_profile_get_best_available_for_topic_publisher(
      node, topic_name, qos_policies, rmw_get_subscriptions_info_by_topic);
  if (!adapted_qos_policies) {
    return nullptr;
  }

@jacobperron
Copy link
Member Author

I'm proposing we "ignore" the best available policy for services by defaulting to values in rmw_qos_profile_services_default. 9bb3bdf

@j-rivero
Copy link
Contributor

I see some failures in my local workspace related to fastrtps_dynamic. Not sure if they are related to my local setup:

space in code/ros2/ws_qos_adaptative ❯ colcon test-result
build/test_quality_of_service/Testing/20220426-1502/Test.xml: 22 tests, 0 errors, 3 failures, 0 skipped
build/test_quality_of_service/test_results/test_quality_of_service/test_deadline__rmw_fastrtps_dynamic_cpp.gtest.xml: 1 test, 1 error, 0 failures, 0 skipped
build/test_quality_of_service/test_results/test_quality_of_service/test_lifespan__rmw_fastrtps_dynamic_cpp.gtest.xml: 1 test, 1 error, 0 failures, 0 skipped
build/test_quality_of_service/test_results/test_quality_of_service/test_liveliness__rmw_fastrtps_dynamic_cpp.gtest.xml: 1 test, 1 error, 0 failures, 0 skipped

...

25: test_subscriber_terminates_in_a_finite_amount_of_time[BoundedPlainSequences] (test_communication.TestPublisherSubscriber)
25: Test that the subscriber terminates after a finite amount of time. ... [INFO] [test_publisher-8]: process started with pid [757421]
25: [INFO] [test_subscriber-9]: process started with pid [757423]
13: [INFO] [1650985483.968604443] [publisher]: publisher: publishing 'publisher: testing 22'                    
13: [INFO] [1650985483.968782138] [subscriber]: subscriber: subscriber heard [publisher: testing 22]
13: [INFO] [1650985484.218879834] [publisher]: publisher: publishing 'publisher: testing 23'                    
13: [INFO] [1650985484.219087234] [subscriber]: subscriber: subscriber heard [publisher: testing 23]
25: [INFO] [test_subscriber-9]: process has finished cleanly [pid 757423]
25: [INFO] [test_publisher-8]: sending signal 'SIGINT' to process[test_publisher-8]
25: [test_publisher-8] [INFO] [1650985484.277540590] [rclcpp]: signal_handler(signum=2)
13: [INFO] [1650985484.468542173] [publisher]: publisher: publishing 'publisher: testing 24'                    
13: [INFO] [1650985484.468859520] [subscriber]: subscriber: subscriber heard [publisher: testing 24]
13: -- run_test.py: return code -11                                                                             
13: -- run_test.py: generate result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_deadline__rmw_fastrtps_dynamic_cpp.gtest.xml' with failed test
13: -- run_test.py: verify result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_deadline__rmw_fastrtps_dynamic_cpp.gtest.xml'
13/22 Test #13: test_deadline__rmw_fastrtps_dynamic_cpp .........***Failed   10.14 sec
...
26: Starting test run test_communication.test_requester_replier__rclcpp__rmw_connextdds_Release.launch_tests[Arrays]
26: [INFO] [launch]: All log files can be found below /home/jrivero/.ros/log/2022-04-26-16-21-08-673136-space-727240
26: [INFO] [launch]: Default logging verbosity is set to INFO
26: [INFO] [daemon-stop-1]: process started with pid [727243]
26: [INFO] [daemon-stop-1]: process has finished cleanly [pid 727243]                                           
26: test_requester_finishes_in_a_finite_amount_of_time[Arrays] (test_communication.TestRequesterReplier)
26: Test that the second executable terminates after a finite amount of time. ... [INFO] [test_replier-2]: process started with pid [727246]
26: [INFO] [test_requester-3]: process started with pid [727248]
26: [INFO] [test_requester-3]: process has finished cleanly [pid 727248]                                        
26: [INFO] [test_replier-2]: sending signal 'SIGINT' to process[test_replier-2]
26: [test_replier-2] [INFO] [1650982869.257146156] [rclcpp]: signal_handler(signum=2)
15: -- run_test.py: return code -11                                                                             
15: -- run_test.py: generate result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_liveliness__rmw_fastrtps_dynamic_cpp.gtest.xml' with failed test
15: -- run_test.py: verify result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_liveliness__rmw_fastrtps_dynamic_cpp.gtest.xml'
15/22 Test #15: test_liveliness__rmw_fastrtps_dynamic_cpp .......***Failed    8.14 sec
...
5: OK
25: test_subscriber_terminates_in_a_finite_amount_of_time[UnboundedSequences] (test_communication.TestPublisherSubscriber)
25: Test that the subscriber terminates after a finite amount of time. ... [INFO] [test_publisher-35]: process started with pid [757704]
25: [INFO] [test_subscriber-36]: process started with pid [757706]
14: [INFO] [1650985494.360158289] [publisher]: publisher: publishing 'publisher: testing 78'                 
14: [INFO] [1650985494.360315324] [subscriber]: subscriber: subscriber heard [publisher: testing 78]
14: [INFO] [1650985494.485499664] [publisher]: publisher: publishing 'publisher: testing 79'                 
14: [INFO] [1650985494.485702816] [subscriber]: subscriber: subscriber heard [publisher: testing 79]
25: [INFO] [test_subscriber-36]: process has finished cleanly [pid 757706]                                   
25: [INFO] [test_publisher-35]: sending signal 'SIGINT' to process[test_publisher-35]
25: [test_publisher-35] [INFO] [1650985494.594500480] [rclcpp]: signal_handler(signum=2)
14: [INFO] [1650985494.610140462] [publisher]: publisher: publishing 'publisher: testing 80'
14: -- run_test.py: return code -11                                                                          
14: -- run_test.py: generate result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_lifespan__rmw_fastrtps_dynamic_cpp.gtest.xml' with failed test
14: -- run_test.py: verify result file '/home/jrivero/code/ros2/ws_qos_adaptative/build/test_quality_of_service/test_results/test_quality_of_service/test_lifespan__rmw_fastrtps_dynamic_cpp.gtest.xml'
14/22 Test #14: test_lifespan__rmw_fastrtps_dynamic_cpp .........***Failed   10.14 sec

We can wait for Jenkins valid results before investing time in inspecting the errors.

@jacobperron
Copy link
Member Author

CI is green: ros2/rmw#320 (comment)

@ivanpauno
Copy link
Member

I see some failures in my local workspace related to fastrtps_dynamic. Not sure if they are related to my local setup:

See #599

@jacobperron jacobperron merged commit ee60ba6 into master May 3, 2022
@delete-merged-branch delete-merged-branch bot deleted the jacob/qos_best_available branch May 3, 2022 17:34
timonegk added a commit to timonegk/rmw_fastrtps that referenced this pull request May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants