Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to override endpoint qos settings from a qos profile matching the topic name #7

Merged

Conversation

ivanpauno
Copy link
Member

Similar to ros2/rmw_connext#460.

The idea is the following:

  • DDS XML qos profiles can currently be used to specify qos settings ros isn't aware of (e.g.: publisher flow controller), or to modify how ros "SYSTEM DEFAULT" qos settings work (if desired, the XML qos file can use topic filters).
    Except for those two cases, the qos settings specified in the ros code cannot be overridden.
  • There was, up to recently, no mechanism to externally override qos settings specified in code in ROS.
    Now there's one based one parameters, but it's by default disabled (has to be manually opted in), and not widely used yet e.g. setting the parameter qos_overrides./my_topic.publisher.reliability to reliable will modify the reliability of all publishers publishing to the topic /my_topic, for the node where the parameter was set.

With this PR, if a qos profiles library name was provided and that library has a profile named as the topic name, that profile will be used and override the user provided ros qos settings.

I'm not sure if this is the best way to use XML QoS profiles, so I'm open to alternative ways to achieve something similar.
I also see there are some functions to list "potential qos profile names", but those don't seem to be being used.

@asorbini
Copy link
Collaborator

* There was, up to recently, no mechanism to externally override qos settings specified in code in ROS.
  Now there's one based one parameters, but it's by default disabled (has to be manually opted in), and not widely used yet e.g. setting the parameter `qos_overrides./my_topic.publisher.reliability` to `reliable` will modify the reliability of all publishers publishing to the topic `/my_topic`, for the node where the parameter was set.

How does this feature work? Does it destroy/re-create the endpoint with the modified QoS? I'm assuming that's the case, otherwise I think we would need some set_qos() APIs in the RMW, and they would only be able to change "mutable" QoS settings, in the case of DDS (and reliability is not one of them, for example).

In the case of delete/create, how is the application prevented from accessing a stale reference? Or is it up to the application to explicitly handle the parameter's events?

With this PR, if a qos profiles library name was provided and that library has a profile named as the topic name, that profile will be used and override the user provided ros qos settings.

I'm not sure if this is the best way to use XML QoS profiles, so I'm open to alternative ways to achieve something similar.
I also see there are some functions to list "potential qos profile names", but those don't seem to be being used.

The functions you discovered are "leftovers" from some experiments I had done, playing around with the idea of having some extra logic in the RMW to map QoS profiles to specific endpoints, based on some comments that had come up in the RMW WG status meetings. As you can see from the functions, I got a bit carried away in trying to specify multiple levels of possible overrides and fallback options, so I decided to to set it aside, but left them in for the time being.

Overall, I like the idea of being able to override the QoS of a specific endpoint completely from XML, so I'm generally in favor of merging this PR but I'd like to suggest a slightly different approach to try to make it more "future proof".

We could introduce an environment variable RMW_CONNEXT_OVERRIDE_ENDPOINT_QOS_POLICY (name open to suggestions) to control how the RMW determines the QoS used by a certain endpoint.

The "policy" could be one of several:

  • none (default if variable is not set): don't override an endpoint's QoS in any way. The endpoint will use the QoS derived by combining the default DDS QoS for that (type of) endpoint (e.g. as reported by DDS_Publisher_get_default_qos()) with the ROS QoS settings.
  • profile_by_topic_name: search the QoS Library set via RMW_CONNEXT_QOS_PROFILE_LIBRARY(or all libraries in some order, e.g. alphabetic based on some sort function) for a profile matching the endpoint's topic. If found, use it to create the endpoint (e.g. with DDS_Publisher_create_datawriter_with_profile). Ignore any ROS QoS settings.
  • dds_default: Use the default QoS computed by the middleware for the endpoint, i.e. create the endpoint with the DDS_DATAWRITER|DATAREADER_QOS_DEFAULT special value. This would allow topic filters in the QoS profile set as default to come into play (because I think at the moment the RMW doesn't really support them). Ignore any ROS QoS settings.

In the future, we could consider adding additional policies, e.g. "profile_by_node_name", or one that tries multiple policies in some fallback order, and uses the first one that matches.

What do you think?

@ivanpauno
Copy link
Member Author

How does this feature work? Does it destroy/re-create the endpoint with the modified QoS? I'm assuming that's the case, otherwise I think we would need some set_qos() APIs in the RMW, and they would only be able to change "mutable" QoS settings, in the case of DDS (and reliability is not one of them, for example).

That feature uses "read-only" parameters, which can only be modified when loading the node but not afterwards.
Thus, you don't need to create/destroy an endpoint neither a set_qos method.

@ivanpauno
Copy link
Member Author

We could introduce an environment variable RMW_CONNEXT_OVERRIDE_ENDPOINT_QOS_POLICY (name open to suggestions) to control how the RMW determines the QoS used by a certain endpoint.

The "policy" could be one of several:

That makes sense to me.

none (default if variable is not set): don't override an endpoint's QoS in any way. The endpoint will use the QoS derived by combining the default DDS QoS for that (type of) endpoint (e.g. as reported by DDS_Publisher_get_default_qos()) with the ROS QoS settings.

I think topic filters could also be used in this case (haven't checked if those are working or not).
e.g. You want to set a flow controller for some large data topics but not for every topic, and otherwise use ROS qos settings.

@ivanpauno
Copy link
Member Author

We could introduce an environment variable RMW_CONNEXT_OVERRIDE_ENDPOINT_QOS_POLICY (name open to suggestions) to control how the RMW determines the QoS used by a certain endpoint.

6667f97 implemented that suggestion.

@asorbini
Copy link
Collaborator

asorbini commented Mar 1, 2021

I think topic filters could also be used in this case (haven't checked if those are working or not).
e.g. You want to set a flow controller for some large data topics but not for every topic, and otherwise use ROS qos settings.

The problem with using topic filters is that, as far as I understand them, they need an explicit profile name to be applied. They are not considered when determining the "default QoS" (as returned by DDS_XXX_get_default_YYY_qos()). I'd like them to be applied when DDS_XXX_QOS_DEFAULT is used (as hinted by my suggestion for a dds_default policy, but after reviewing the documentation again (for 6.0.1, since I couldn't find it for 5.3.1), I don't think that's the case.

Based on the docs, the DataWriter/DataReader must be created using one of the DDS_XXX_create_YYY_with_profile() APIs for the topic filters to be taken into consideration.

@asorbini
Copy link
Collaborator

asorbini commented Mar 1, 2021

6667f97 implemented that suggestion.

Thank you! I'll review it more thoroughly later today, but I think I already found a "counter argument" to my last comment in your code: I had totally missed the DDS_XXX_get_default_YYY_qos_w_topic_name() functions since they are undocumented, but they definitely "fill the gap" that I was complaining about and they should enable the use of topic filter in conjunction with an XML profile set as default.

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the changes internally, we came up with a slightly different proposal, which tries to make the QoS configuration logic a bit more explicit. I apologize for partially going back on some of my previous statements, but I hope this proposal can bring us to a more solid yet still as user-friendly solution. Bear with me and I'll try to explain.

The proposal stems from the observation that the use case of being able to customize the QoS of specific endpoints while bypassing any setting provided explicitly by the application is a new use case which could be of interest not only to users of the ROS2 API, but also to those using the DDS API directly. This "black box" approach which aims at externally altering application logic represents a new, and currently unsupported, usage pattern: the "assumption" is usually that the application will explicitly request the use of (externally controllable) default values when it desires to do so. If the application elects not to rely on these default values, than any value it passes in is taken "at face value", and applied as is ("they know what they're doing and we shouldn't second guess").

What we are trying to enable here is the ability to, in ROS terms, configure an endpoint as if the application had specified "system default" for all the QoS policies, while in DDS terms, the same could be expressed as creating an endpoint assuming that the application specified DDS_XXX_QOS_DEFAULT instead of whatever concrete values it actually passed to the create() calls. We want to be able to do this without making any changes to the code.

Based on my earlier "policy" proposal, we could already achieve this behavior by:

  1. Use the dds_default policy to completely disable any ROS-level QoS settings.
  2. Use Connext's XML-based configuration to define a default QoS profile which contains:
    • data(reader|writer)_qos elements with topic filters to match each "builtin endpoint" that is automatically created by ROS2 for each node (e.g. the various parameter services). These could be provided to users as a copy/paste'able artifact.
    • data(reader|writer)_qos elements with topic filters to match other endpoints created by the application.

What this solution doesn't cover is the ability to selectively enable this "(no) QoS override" feature only for certain topics, while keeping the default behavior of deriving configuration from application arguments applied on top of Connext defaults for all other endpoints.

In order to support this new use case, we could have users specify the list of topics for which no customization should be applied on top of DDS defaults. This approach would make the selection more explicit, and avoid relying on the "magic mapping" between topic names and qos profile names, which introduces an "additional layer of concepts" (and possible complexity) for users, and potential brittleness of the resulting systems.

The list of topics could be provided, for example, as a regular expression or a "glob" (or a list of such elements). I feel like the list should be expressed in the most "natural" terms for users of the RMW (a.k.a. ROS2 users), so I think that (at least in an ideal world) it should use the "unmangled" topic names without forcing users to learn about the underlying mapping. It would be more natural for a user to say "disable QoS override for topic hello_world and client/service hello_world", rather than "disable QoS override for topics rt/hello_world, rq/hello_worldRequest, and rr/hello_worldReply".

These leads me towards having users being able to use some ROS2-specific DSL to configure these rules, e.g. a YAML object with keys such as topics:, clients:, etc., which might be better stored in a file but could also be passed as an inline string. It also sounds like a "phase two" kind of feature, since it smells of "scope creep" (and something that might not even belong inside the RMW...). Also, users are still forced to learn about the name mangling in order to specify the right topic filters in XML for now.

So, while keeping the general concept of an "override endpoint qos policy" configured via dedicated env variable (RMW_CONNEXT_OVERRIDE_ENDPOINT_QOS_POLICY), we could change its semantics to mean "control how the RMW overrides the default QoS reported by Connext". The policy would have the following values:

  • none always (default if var is empty): use default values reported by DDS_XXX_get_default_YYY_qos_w_topic_name() with ROS configuration applied on top of it.
  • dds_default never: use default values reported by DDS_XXX_get_default_YYY_qos_w_topic_name() without any change for all created endpoints.
  • dds_topics: use default values reported by DDS_XXX_get_default_YYY_qos_w_topic_name() for all endpoints whose DDS topic name (including ROS2 mangling) matches a regular expression, all other endpoints are configured with the default policy (always). The regular expression could be provided "inline" through the same env variable (e.g. "dds_topics: REGEX"), or we could introduce another env variable to specify it (but I'd hesitate to go with this option for fear of bloating the number of configuration variables).

In the future we may decide to add another policy type, which accepts "domain-specific" topic names without the ROS2 mangling.

If we were to go with this revised proposal, we could also remove the "qos library" env variable, since users would only be relying on the profile that they set as default.

The only use case I can think of that this solution would not support, when compared to the current one, is the ability to set one profile as default, and then bypass it/further customize it with other profiles only for certain endpoints. Users would need to consolidate different profiles into a single one using topic filters, but the consolidated profile can still rely on "profile inheritance" (and in Connext 6.x it could also use "QoS snippets" to further "modularize" the configuration).

@asorbini
Copy link
Collaborator

asorbini commented Mar 5, 2021

I introduced some changes which caused conflicts because they modified slightly the context initialization logic, moving some of to some new functions (see rmw_context_impl_t::initialize_participant, and rmw_api_connextdds_init).

@ivanpauno
Copy link
Member Author

These leads me towards having users being able to use some ROS2-specific DSL to configure these rules, e.g. a YAML object with keys such as topics:, clients:, etc., which might be better stored in a file but could also be passed as an inline string. It also sounds like a "phase two" kind of feature, since it smells of "scope creep" (and something that might not even belong inside the RMW...). Also, users are still forced to learn about the name mangling in order to specify the right topic filters in XML for now.

Yes, that really sounds like too much.
I also agree a general mechanism for overriding qos doesn't belong to rmw, that's why the "qos parameter overrides" were added.
This was only a "DDS style" alternative to the above feature that pretends to be simple.

dds_topics: use default values reported by DDS_XXX_get_default_YYY_qos_w_topic_name() for all endpoints whose DDS topic name (including ROS2 mangling) matches a regular expression, all other endpoints are configured with the default policy (always). The regular expression could be provided "inline" through the same env variable (e.g. "dds_topics: REGEX"), or we could introduce another env variable to specify it (but I'd hesitate to go with this option for fear of bloating the number of configuration variables).

Sounds reasonable to me, I'm not sure if you prefer a list of topics or a regex after "dds_topics: ".
e.g.: dds_topic: topicA;topicB;topicC (list)
or: dds_topic: topic(A|B|C) (regex)

I don't have a preference, if we want to go for a regex I will simply use std::regex with the default grammar (ECMAScript).

@asorbini
Copy link
Collaborator

I don't have a preference, if we want to go for a regex I will simply use std::regex with the default grammar (ECMAScript).

Personally, I like regular expressions, and it sounds like it would be relatively easy to support them, so I would go with that option, since it provides more flexibility, and it can still easily reproduce the "simple list" version.

My only concern is that we should try to avoid an "escape sequence hell" for users. I haven't thought it through too much, but as long as specifying the regex as a shell variable doesn't introduce an additional level of escaping (on top of what C++ already requires when specifying string literals for regex, e.g. "\\\\" to create a regex that matches a single \) then I can't think of any real con's for going with regular expressions (other than people's anecdotal hostility to them, and the fact that we will need to clearly spell out the syntax to use in the docs, and maybe add a couple of "interesting" examples).

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few naming changes and some opportunistic clean up that I'd like to bundle up in this PR, LGTM!

@asorbini
Copy link
Collaborator

I see there are also some conflicts that I introduced with some recent commits this morning and that will need to be resolved, sorry about that!

@ivanpauno ivanpauno force-pushed the ivanpauno/qos-profile-file-topic-overrides branch from bf8a912 to e1c633a Compare March 12, 2021 16:28
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
…ted possible qos profile names

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno
Copy link
Member Author

I see there are also some conflicts that I introduced with some recent commits this morning and that will need to be resolved, sorry about that!

No problem, last force push should've resolved all the conflicts

@ivanpauno
Copy link
Member Author

never: use default values reported by DDS_XXX_get_default_YYY_qos_w_topic_name() without any change for all created endpoints.

I noticed that when the never policy is used the specified qos for the ros_discovery_info might not match with the ones of other nodes and create some "unknown node name" issues.
Maybe for that topic the "always" qos override policy should be used, or this behavior might be considered fine and that's just something that the people using the never policy have to watch out for.

@asorbini
Copy link
Collaborator

I noticed that when the never policy is used the specified qos for the ros_discovery_info might not match with the ones of other nodes and create some "unknown node name" issues.
Maybe for that topic the "always" qos override policy should be used, or this behavior might be considered fine and that's just something that the people using the never policy have to watch out for.

I agree that one option would be to exclude ros_discovery_info from the override policy, or rather have it behave as if the policy was always always (no pun intended).

One way to achieve this would be to propagate the internal argument from rmw_connextdds_create_datawriter() to rmw_connextdds_get_datawriter_qos() in dds_api_ndds.cpp (and doing the same for the corresponding DataReader functions). I planned on using that flag to mark any endpoint that is created by the RMW itself, and so far these include only the reader and writer created for ros_discovery_info. In general, it would be acceptable to say "all internal endpoints are exempt from the endpoint_qos_override_policy, and the RMW will always customize their QoS on top of Connext's default values, as per the always policy".

On the other hand, we could argue that what you observed is expected behavior of the never policy, in order to maximize the "flexibility" of the policy. In order to be fully interoperable with an application which uses the always policy (and thus ROS2's explicit QoS settings), users must extend their profile with rules to match the "built-in" ROS2 endpoints.

These endpoints include the reader/writer on ros_discovery_info created by the RMW, but also all of the endpoints that are implicitly created by rcl for every ROS2 Node (e.g. to inspect and modify the Node's parameters).

The best solution to support this would be to provide users with an XML file containing a profile with pre-configured rules that users can either use as their base profile, or copy/paste the required rules for built-in endpoint into their profile, e.g.:

<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="http://community.rti.com/schema/5.3.1/rti_dds_qos_profiles.xsd" version="5.3.1">
    <!-- Qos Library -->
    <qos_library name="ROS2Defaults">
        <qos_profile name="NodeBuiltinEndpoints" base_name="BuiltinQosLib::Generic.Common">
            <datawriter_qos topic_filter="ros_discovery_info">
                <reliability>
                    <kind>RELIABLE_RELIABILITY_QOS</kind>
                </reliability>
                <history>
                    <kind>KEEL_LAST_HISTORY_QOS</kind>
                    <depth>1</depth>
                </history>
                <durability>
                    <kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
                </durability>
            </datawriter_qos>
            <datareader_qos topic_filter="ros_discovery_info">
                ...
            </datareader_qos>

            <datawriter_qos topic_filter="rq/*/describe_parametersRequest">
                ...
            </datawriter_qos>

            ...

        </qos_profile>
    </qos_library>
</dds>

I think in the long run this might be a better approach than making an exception for internal endpoints, which I fear will invevitably lead to the eventual introduction of an additional policy that also includes them (something like never-for-real-this-time).

In fact, going with the exception route would force us to find a way to somehow designate the built-in endpoints created by rcl also as internal, and that would probably become a maintenance nightmare. Otherwise, for example, if any of these endpoints were to use TRANSIENT_LOCAL volatility (as in the case of ros_discovery_info), they would be "downclassed" to VOLATILE with the never policy and fail to match their default counterparts.

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good and ready for merging (beside a minor change).

Before doing that though, I'd like to add a section to the README to briefly document the variable and the accepted values. You can just add it to the end of the "Runtime Configuration" section, and we can sort the list in alphabetical order in a later commit (I meant the variables to be in alphabetical order, but I just noticed I messed the order up when I renamed RMW_CONNEXT_DO_NOT_OVERRIDE_PUBLISH_MODE to RMW_CONNEXT_USE_DEFAULT_PUBLISH_MODE).

By the way, can you also run a quick test (or maybe you already did) to verify that literals on the shell don't need any extra level of escaping and they are propagated to std::regex() as expected?

Thanks again for all this work, it's going to be a useful feature for sure!

@ivanpauno
Copy link
Member Author

I think in the long run this might be a better approach than making an exception for internal endpoints, which I fear will invevitably lead to the eventual introduction of an additional policy that also includes them (something like never-for-real-this-time).

👍 I was thinking exactly the same.
I think that adding an example qos profile with default qos for the rmw internal and rcl topics as you mentioned is a good idea.

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno
Copy link
Member Author

ivanpauno commented Mar 16, 2021

I've added documentation in 86d0155.

I would like to add a complete qos profile example for the never qos override policy case, but didn't get to it yet.

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready for merging.

I'll create an issue for an example XML config file for the never policy, so we can add it after closing this PR.


- Topics are prefixed with `rt`. e.g.: `/my/fully/qualified/ros/topic` is converted to `rt/my/fully/qualified/ros/topic`.
- The service request topics are prefixed with `rq` and suffixed with `Request`. e.g.: `/my/fully/qualified/ros/service` request topic is `rq/my/fully/qualified/ros/serviceRequest`.
- The service response topics are prefixed with `rr` and suffixed with `Response`. e.g.: `/my/fully/qualified/ros/service` response topic is `rr/my/fully/qualified/ros/serviceResponse`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section looks good as is, at least for an initial merge.

I'm considering starting a proper manual in Sphinx for rmw_connextdds[micro], putting it on public hosting (e.g. readthedocs), and move most documentation out of the readme, to avoid bloating it inevitably.

With that available, I would probably move this information about topic mangling to a different section, one more general than this one about env variables (e.g. "Configuring QoS in rmw_connextdds").

@asorbini asorbini merged commit 1edeca2 into ros2:master Mar 16, 2021
@asorbini
Copy link
Collaborator

I didn't catch that the get_default_qos_w_topic_name() functions are not available in Micro. Fixed in c193fc5.

asorbini pushed a commit that referenced this pull request Mar 16, 2021
…hing the topic name (#7)

* Add different policies of how to use DDS XML qos profiles.

* rename some symbols

* Remove qos_library field from context impl and all functions that listed possible qos profile names

* Delete debug log

* fix unnoticed bug

* Add docs

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno
Copy link
Member Author

I didn't catch that the get_default_qos_w_topic_name() functions are not available in Micro. Fixed in c193fc5.

Thanks for that, I also realized that I could've deleted this two:

https://github.com/rticommunity/rmw_connextdds/blob/c193fc5939eb4b3bd68e188355a08ba08d418b83/rmw_connextdds_common/include/rmw_connextdds/dds_api_rtime.hpp#L38-L40

I will send a PR with that.

@ivanpauno ivanpauno deleted the ivanpauno/qos-profile-file-topic-overrides branch March 17, 2021 17:40
@asorbini asorbini added dashing PR pertaining the Dashing release eloquent PR pertaining the Eloquent release foxy PR pertaining the Foxy release galactic PR pertaining the Galactic release labels Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashing PR pertaining the Dashing release eloquent PR pertaining the Eloquent release foxy PR pertaining the Foxy release galactic PR pertaining the Galactic release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants