-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional items for the ROS 2 Galactic Roadmap #901
Comments
In general, feel free to open a pull request against https://github.com/ros2/ros2_documentation/blob/master/source/Roadmap.rst and tag me; I'm happy to review it. I will tell you that I have serious reservations about item number 2 there. I think it was a mistake to introduce the synchronous service client into Python, and I don't think we should replicate that mistake into rclcpp. But we can discuss it on the PR. |
@clalancette If you don't mind me already trying to clarify point 2 (I ran into this problem while porting and passed it on to Dejan): Do you mean the problems mentioned in https://index.ros.org/doc/ros2/Tutorials/Sync-Vs-Async/? I think a nice asynchronous future-based API would make everyone happy as well.¹ The problem with the existing API is that while it returns a future, you can't really seem to retrieve its result if you're inside a node ( ¹ My impression of a callback API in contrast is that it makes it hard to control timing and to handle errors. |
Yep, those are exactly the problems I'm hoping to avoid. It's just too easy for users to call a synchronous service call inside of the callbacks and then cause a deadlock.
It is definitely possible to call a service from inside a node. There's an example of doing just this in https://github.com/ros2/demos/blob/master/demo_nodes_cpp/src/services/add_two_ints_client_async.cpp . That solution uses a callback to get the result when it is ready. The other way to do it is to store the future as returned from You are definitely correct that our documentation doesn't go over all of this very well. So that's a place where we should improve. |
Yes, the callback works, but I am interested in a version without callbacks. Please correct me if I'm wrong, but I see a callback-based service-client model as basically the same as two pub-sub pairs (request and response), with most of the associated problems. You send off the request, but if e.g. there is no service running, your callback just will never be called, so you can't really handle that kind of error. You can't set a limit on how long the call may take. The order of responses for two service calls becomes unpredictable. Between the request and the callback, the state of your node may have totally changed. You probably know all of this and agree that straight-line code is just much easier to reason about. For me, "straight-line code" doesn't need to be synchronous, it can also mean a future that you wait on for a certain amount of time, and that's what I meant by "nice asynchronous future-based API". Basically this: auto res = client_->async_send_request(...);
if (res.wait_for(10ms) == std::future_status::ready) {
... res.get() ...
} Does that make sense? So the open questions for me are
Regarding point 2, you pointed out
I tried the following, and it only printed the first log message. Do you mean some other way of waiting? auto result_future = client_->async_send_request(request);
AddTwoInts::Response::SharedPtr result;
std::thread wait_thread([this, &result_future, &result](){
RCLCPP_INFO(this->get_logger(), "In thread");
auto status = result_future.wait_for(1000ms);
RCLCPP_INFO(this->get_logger(), "Finished waiting");
});
wait_thread.join(); Or is the issue that you need to return from the function that made the request before you can wait (which means no straight-line code)? |
In point of fact, that is exactly how it is implemented. I agree that it has some problems, but:
We do have
You can if you use the future and a timeout on the future. But you are right, you can't do it with the callback. That might be a feature we could add, though I'd have to think about it a bit more.
That's true; if I make two calls back-to-back, they both arrive at the destination at about the same time, but the second one completes first, I'll get the second response first. But that seems like the service call is being misused in that case. In the "straight-line" code, you'd do:
In the callback case, you'd do:
I'm not arguing that the second case is clearer, but I am arguing that you can achieve the same semantics.
While that is true, that is true of anything that takes a substantial amount of time to complete. So I'm not sure that this is an argument against a callback mechanism.
Yes, that's exactly the problem with that code. Fundamentally, in the default case, you have a single threaded executor calling callbacks as events happen. Within those callbacks, you can't do things that wait on further callbacks coming in; you are blocked inside of the callback. The way around it is of course threading, either by hand (
Here's the thing about service calls as straight-line code. Let's assume for a second that we can solve the deadlock case, and you could call service calls "straight-line" from callbacks. I still don't think that is a good idea, because now your callbacks can take arbitrary amounts of time, and now you are potentially starving the rest of callbacks in the system while you are waiting for your service result. |
I can answer in more detail tomorrow, but I wanted to say thanks for troubling yourself to answer so quickly but also clearly. For now, let me just add one comment on this:
Isn't the question of straight-line/callback style independent from whether timeouts are used? I think we agree that any good solution should discourage waiting indefinitely. |
For full description and discussions see ros2#901.
@clalancette @nnmm I opened a PR for the items above: #914. I will wait until you converge on the item 2 and added afterwards. |
Sort of. I was more thinking about the case where you did this in a callback:
If the future took 990ms to return, you would still succeed, but you would have starved the rest of the system for essentially a full second. That number is somewhat contrived, but any timeout you choose put in there has the potential to add that amount of latency to downstream callbacks. In general, in this callback-based system, it is best to do as little work as possible in the callbacks and return quickly. That way you keep the latency down on the entire system. |
@clalancette Could you provide or point to an example how to write "straight-line"ish code with a multi-threaded executor? Would that mean that in writing a node in a particular way, I need to make assumptions about in what executor it is used? That would sound like it breaks encapsulation |
Yep, and there are other reasons your callback might never run, like an unreliable network.
I was thinking about a setup where no other callbacks are being run while the current one is still active, as with the default single-threaded executor (although I don't know if that is fundamentally incompatible with receiving the service call result). I'd argue it's a general advantage of straight-line code that it tends to have less cognitive complexity: It's more "well-behaved" by default which helps prevent bugs. It potentially also means people need to know/think less to use it, as well as write less code – as an example, imagine a service handler that computes its response by calling two other services in parallel (gist with example code here).
I concur with @fred-apex-ai, while this shouldn't become a how-to discussion, in the example above I already was using a
I see what you mean, users can set a too large timeout and jam their system when the called service does too much work itself, is down, or experiences a latency spike. They already can do that in various other ways though, by computing for too long or with Despite that it could be useful to make straight-line service service calls from inside a node. Of course one scenario would be service calls that are quick relative to the latency requirements of your messages. Then there probably are nodes where queuing up incoming messages is the intended behavior. I'm sure other people have more to add to this. In an effort to bring this to a conclusion, my summary would be:
I think you're better equipped to comment on what should be in the roadmap, Chris. ¹ And they should be able to mitigate the problem with a multithreaded executor, at the cost of making things happen behind their backs again, right? |
One take away for me is to realize the cost associated with any service call both in terms of latency and algorithm complexity which leads me to recommend to avoid service calls when possible and design the hot path functions so they have all the required input when they start. |
For full description and discussions see ros2#901.
First, apologies for the long delay in response here; I was on vacation all last week, and I'm just catching up. But I also wanted to spend some time to put examples together to show you how you can achieve the straight-line code you are looking for (even if I still don't think it is a good idea, for other reasons). With that said, I'll respond to some of the comments:
https://github.com/clalancette/executor_examples now has a number of examples of calling a service "in-line" in a callback. Each example has a bit of explanatory text explaining why it does or does not work. In short, the examples in there that work are:
(the other examples have explanations on why they don't work for straight-line code)
Right, that's why my general recommendation is to do as little work as possible in the callback, and then do "real" processing elsewhere.
Yes, agreed.
Agreed.
Yes, exactly. That was my initial objection.
Yes, though at the moment I still don't know exactly what that looks like.
Agreed that there should be more documentation around this. I'll also add in one more thought that occurred to me as I was playing around/thinking about this. One other way to allow more "straight-line" code would be to add Unix
I'm trying to keep the roadmap a list of things that could reasonable get done for Galactic. At the moment, Open Robotics doesn't have the resources to implement any of the above. So while I'm happy to add something to the roadmap, I would only do so if someone else commits to it. |
IMO, using the |
We could certainly consider adding an option to node construction. Note that this ends up being syntactic sugar, since you can achieve this in a more verbose way by creating your own callback groups.
Changing the default would be a big change. Right now, any code that is written using the default can use resources shared with other callbacks with impunity (as it is guaranteed only one runs at a time, even with the |
Yes, you can already get the behavior now; I just think there's a lot to be said for the user experience of flipping a bit one time in one location instead of worrying about the callback group object for everything in the node. I will try to create a draft PR for this over the holidays since I expect it should be a very simple change. |
Sorry I haven't chimed in yet, but this conversation (specifically about item number 2) kept getting longer and I could never catch up till now :) First I want to address what I think is the core issue here for an async/await style service call (what you all have been calling "straight-line service call" I think). @clalancette said and then @nnmm replied:
Then @clalancette also said:
I think @clalancette is right here that we don't want people blocking inside of callbacks because it will prevent other work from being done, especially so in a single threaded executor, but also it ties up resources in a multithreaded executor. And anyways we don't want to have the business logic care about what executor is used. The obvious way to fix this in my experience is what @clalancette suggested elsewhere, you spawn a thread and continue the work there. You can observe this pattern in many frameworks, but it is very cumbersome some times. A good example is in older Javascript, where there is no blocking and no async/await syntax, so everything becomes This obviously isn't ideal, but it's what you have to do without language support for async/await. More modern languages like Python3 and newer versions of Javascript have first-class support for async/await, which allows you to write code that appears to be synchronous (what you all have been calling "straight line code") but is actually asynchronous. So, at best I think we could:
Would an example that looks something like this be ok with you guys? void ros_timer_callback_from_spin()
{
auto future = client_->async_send_request(
request,
10s,
[](auto response) { /* success, could continue logic here */ },
[](auto error) { /* could be timeout or error */ });
} My opinion on this is that we cannot do better (than the example just above) because of two reasons:
I think we all agree what we'd like is something like this: void some_callback_from_spin(...)
{
// ...
auto result = await service_client->make_async_request(..., 10s);
if (result.did_timeout) {
// error handling
}
auto response = result.get();
print_result(response);
} But I think the best we can do is something like this: void some_callback_from_spin(...)
{
// ...
auto result = await service_client->make_async_request(
...,
10s,
[](auto response) {
print_result(response);
},
[](auto error) {
// error handling
}
);
} |
I should say too, that even if we want to do what I proposed the technical details of how to do that are still a bit fuzzy to me, but I think they can be done. It will probably require the use of ROS timers and/or threads, and in which case we'd probably want to expose those details somehow, because while convenient this may cause scaling issues (creating a new thread each time for instance vs using a thread-pool, etc.). |
Dear @wjwood, thanks a lot for taking the time to share your thoughts and an explicit proposal. This asynchronous discussion about asynchronous calls really adds a lot of latency (pun intended): two months since we started and it's getting hard to follow. Adding an error callback would be a big improvement. If callback style is desired, how about going all in and having a version without I'm not sure about the type passed to the error callback in the above example. Currently, the service handler can't send errors back to the client (unless modeled by the response data type), right? Our conclusionWe (i.e. @nnmm and myself) would be very pleased if you could add the below items on the agenda for future work. We would support you with buying a coffee or beer if that helps :)
yes, sounds great
yes, please
oh yeah, I think this issue contains enough links and knowledge on its own
If that helps and it can be made a bit more user friendly, that would help, too.
We aren't async/await experts over here, but isn't async/await a different thing from what was discussed here? The function that uses the "straight-line" call would look and behave like a normal synchronous function from the outside, but a function using await would have to be async itself and couldn't be used as a subscription callback? |
@fred-apex-ai What I'll do is suggest you open up issues corresponding to each of the features you'd like to see. I can't guarantee that we'll work on it, but it will at least be tracked and can be considered for a future release. It looks like most of these feature requests are for rclcpp, so I'll suggest opening them against https://github.com/ros2/rclcpp As for some documentation updates, it would be really helpful if you could look through the existing documentation in this repository and find a place where you feel like what you learned here applies. And then open a PR with some of those changes. I'm happy to review changes as they come in. Thanks! |
For full description and discussions see ros2#901.
We are currently straight porting one rather large project (160+ K lines of code) from ROS 1 to ROS 2. In the process of doing that we encountered several features not yet available in ROS 2: https://github.com/tier4/AutowareArchitectureProposal/blob/ros2/docs/porting-to-ROS2.md.
I'd like that we add some of these features to the agenda for ROS 2 G if possible:
@clalancette I believe that this is best directed to you.
The text was updated successfully, but these errors were encountered: