Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing behavior_tree from Simple Commander API causes navigation on cancelled goal #3156

Closed
automech-rb opened this issue Aug 28, 2022 · 21 comments

Comments

@automech-rb
Copy link
Contributor

automech-rb commented Aug 28, 2022

Bug report

Required Info:

  • Operating System:
    • Docker 22.04 Ubuntu
  • ROS2 Version:
    • rolling
  • Version or commit hash:
  • DDS implementation:
    • rmw_cyclonedds_cpp

Steps to reproduce issue

Set default behavior tree default_nav_to_pose_bt_xml as navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml in param file.
Use Simple Commander API and request any goal using:

-> goToPose(pose1, behavior_tree=’../my_navigate_to_pose_w_replanning_and_recovery.xml‘)
Here behavior_tree can be any tree (even same as default used by Nav2 but using file)

-> Then cancel the above navigation using cancelTask() API call.

-> Next use goToPose(pose2) or use Rviz2 to request different goal using default bt tree.

Expected behavior

Navigate to new goal(pose2) according to default behaviour tree.

Actual behavior

Nav2 navigates to cancelled goal(pose1) instead of new requested goal.
Gif is shown below as example.

Additional information

ezgif-4-f2f3440b1b
In this video I am using follow_point.xml as 2nd behavior (bug can be reproduced with any behavior), and then using nav2 API to cancel it.
Then rviz2 for another goal request.

I fell like this should have been solved by this PR. But maybe changing bt xml leads to some unforeseen circumstances.

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 29, 2022

Do you only see this through the simple commander API for the first request or with specific behavior trees? I regularly cancel and reissue goals in rviz2 for testing using the default BT and do not see that behavior.

Do you have logs? What are they saying? If you send a new goal, that should be immediately overridden unless you set the parameter for the blackboard goal ID to something else - let alone the fact that the entire BT should be deleted and reconstructed if you change BTs. https://github.com/ros-planning/navigation2/blob/main/nav2_bt_navigator/src/navigators/navigate_to_pose.cpp#L87

What rviz tool are you using to request the goal? The nav2 provided one or the default rviz one?

CC @jwallace42

@automech-rb
Copy link
Contributor Author

automech-rb commented Aug 30, 2022

Hi @SteveMacenski
This bug can be reproducer may time in a run repeatedly.
I first need to set default_nav_to_pose_bt_xml in param file to navigate_w_..._if_path_becomes_invalid.xml, as the Nav2 default bt tree in not able to reproduce the issue.
Also, just using Rviz2 is not able to reproduce the issue, you need to use Nav2 python API with different behaviour tree to request a goal and then cancel it using API. Then use Rviz2 or python API again to request goal with default navigate_w_..._if_path_becomes_invalid.xml or any xml as stated in my report.

Here are the logs:

[bt_navigator-16] [INFO] [1661818536.533826771] [bt_navigator]: Begin navigating from current location to (-3.50, -6.92)
[controller_server-12] [INFO] [1661818536.715852307] [controller_server]: Received a goal, begin computing control effort.
[controller_server-12] [INFO] [1661818537.816026861] [controller_server]: Passing new path to controller.
[controller_server-12] [INFO] [1661818538.816038295] [controller_server]: Passing new path to controller.
[controller_server-12] [INFO] [1661818539.816025915] [controller_server]: Passing new path to controller.
[bt_navigator-16] [INFO] [1661818540.864537583] [bt_navigator]: Goal canceled
[bt_navigator-16] [WARN] [1661818540.864587410] [bt_navigator]: [navigate_to_pose] [ActionServer] Client requested to cancel the goal. Cancelling.
[controller_server-12] [INFO] [1661818540.916024917] [controller_server]: Goal was canceled. Stopping the robot.
[controller_server-12] [WARN] [1661818540.916074971] [controller_server]: [follow_path] [ActionServer] Aborting handle.
[controller_server-12] [WARN] [1661818540.916251866] [controller_server]: [follow_path] [ActionServer] Client requested to cancel the goal. Cancelling.
[bt_navigator-16] [INFO] [1661818549.518423350] [bt_navigator]: Begin navigating from current location to (3.90, 2.00)
[controller_server-12] [INFO] [1661818549.521891848] [controller_server]: Received a goal, begin computing control effort.
[controller_server-12] [INFO] [1661818567.722652652] [controller_server]: Reached the goal!
[bt_navigator-16] [INFO] [1661818567.758608174] [bt_navigator]: Goal succeeded

Here, the new requested goal x ,y :: (3.90, 2.00) is not respected (even though it is logged correctly by bt_navigator), but instead navigates to old cancelled goal.

Also, I am using rviz tool provided by nav2 but the issue can be reproduced just using python API.
............
I made a script to recreate this behavior (please update the goal locations according to your map and location of bt xml):
For this script don't set default_nav_to_pose_bt_xml to anything. Leave it as default.

from geometry_msgs.msg import PoseStamped
from nav2_simple_commander.robot_navigator import BasicNavigator, TaskResult
import rclpy
from rclpy.duration import Duration
from time import sleep


def main():
    rclpy.init()

    navigator = BasicNavigator()
    navigator.waitUntilNav2Active()

    #Pose 1 to be cancelled
    goal_pose1 = PoseStamped()
    goal_pose1.header.frame_id = 'map'
    goal_pose1.header.stamp = navigator.get_clock().now().to_msg()
    goal_pose1.pose.position.x = 3.9
    goal_pose1.pose.position.y = 2.0
    goal_pose1.pose.orientation.w = 1.0

    #Pose 2 to be navigated
    goal_pose2 = PoseStamped()
    goal_pose2.header.frame_id = 'map'
    goal_pose2.header.stamp = navigator.get_clock().now().to_msg()
    goal_pose2.pose.position.x = 1.13
    goal_pose2.pose.position.y = -8.16
    goal_pose2.pose.orientation.w = 1.0

    navigator.goToPose(goal_pose1)
    sleep(5.0)
    # Cancel goal_pose1
    navigator.cancelTask()

    sleep(5.0)

    navigator.goToPose(goal_pose2, '/home/navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml')
    i = 0
    while not navigator.isTaskComplete():
        i = i + 1
        feedback = navigator.getFeedback()
        if feedback and i % 10 == 0:
            print('Estimated time of arrival: ' + '{0:.0f}'.format(
                  Duration.from_msg(feedback.estimated_time_remaining).nanoseconds / 1e9)
                  + ' seconds.')
    # Do something depending on the return code
    result = navigator.getResult()
    if result == TaskResult.SUCCEEDED:
        print('Goal succeeded!')
    elif result == TaskResult.CANCELED:
        print('Goal was canceled!')
    elif result == TaskResult.FAILED:
        print('Goal failed!')
    else:
        print('Goal has an invalid return status!')

    exit(0)


if __name__ == '__main__':
    main()

@jwallace42
Copy link
Contributor

So,

just to recap.

We see the issue only when using the nav2_simple_commander(to cancel the goal) with the https://github.com/ros-planning/navigation2/blob/main/nav2_bt_navigator/behavior_trees/navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml.

It seems that the cancel from the nav2_simple_commander does something different that the cancel from the rviz panel. We know the cancel is not correctly canceling the computePathToPose action due to the old path still being present. (Could there be some other exit flow? i.e

I think the next step would be to run with the log level set to debug so we can take a look and see if the cancel request is actually cancelling the active actions in the behavior tree. The other thing I would like to do is send these goals manually though the command line to see if get the same results(you should :)).

I might have some free time to track this down by the E.O.W.

@SteveMacenski
Copy link
Member

I see that we share the same blackboard_ in the BT Action Server even if we change bebehavior trees themselves. We could destroy the blackboard and recreate it if BT's change (that seems sensible; move this block into a new function called on configuration + changing BTs). But to me that doesn't really solve the core issue of how this could be happening in the first place -- then secondarily cleaning up the BT for switching.

The correct Begin navigating from current location to ... pose tells me that its getting the request and setting it in the blackboard - so it seems likely that the error isn't in the BT navigator if it only happens with a specific behavior tree(s), but rather in how the tree is running.

The 2 nodes that happen before computing the path to the pose are GloballyUpdatedGoalCondition and IsPathValidCondition which are both unique to these sets of BTs so that tracks for me as a good place to start looking.

If I were to make a guess based on a few minutes of looking @jwallace42 but without actually tracking down the error itself to validate; I think what's happening is that this part of the BT under ReactiveSequence is returning so that we don't compute a path via ComputePathToPose so its using the old path on the blackboard.

@automech-rb this only happens when the second request uses the if_path_becomes_invalid.xml BTs, correct?

@automech-rb
Copy link
Contributor Author

automech-rb commented Aug 31, 2022

@SteveMacenski Yeah, this only occurs when using if_path_becomes_invalid. Also need to specify bt xml file in goToPose API. I updated the python script as an example. If I don't specify xml file in API, everything works as expected.

As you both commented earlier, I feel this is more of issue of how blackboard_ is handled when using if_path_becomes_invalid xml file in API.

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 31, 2022

Also need to specify bt xml file in goToPose API. I updated the python script as an example. If I don't specify xml file in API, everything works as expected.

This is where we're really missing each other and need a clear statement of what specifically works and what specifically doesn't. Are you saying that if you just goToPose even using the if_path_becomes_invalid.xml as the default and cancel between requests, its fine if its not specified in the action request? So this only happens when both using if_path_becomes_invalid.xml and also specifying it in at least one request?

@automech-rb
Copy link
Contributor Author

automech-rb commented Sep 1, 2022

Hello @SteveMacenski
Sorry for the confusion. Let me explain it more clearly:

A. This error does not occur:

Set default behavior tree default_nav_to_pose_bt_xml as navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml in param file.
Use Simple Commander API and request any goal using:
-> goToPose(pose1)
-> cancelTask()
-> Next use goToPose(pose2)

B. This error occurs:

Case 1 (original bug report method):
Set default behavior tree default_nav_to_pose_bt_xml as navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml in param file.
Use Simple Commander API and request any goal using:
-> goToPose(pose1, ’navigate_to_pose_w_replanning_and_recovery.xml‘)
Here behavior_tree file in API can be any tree (even same as default used by Nav2 but using file)
-> Then cancel the above navigation using cancelTask() API call.
-> Next use goToPose(pose2)

Case 2 (script method):
Don't set default_nav_to_pose_bt_xml in param file (leave it as Nav2 default).
Use Simple Commander API and request any goal using:
-> goToPose(pose1)
-> Then cancel the above navigation using cancelTask() API call.
-> Next use goToPose(pose2, 'navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml')

In both error cases, I am specifying behavior file in python API, which led to my conclusion that using xml file in python API is one of the cause of this error.
Also in both error case, I need to use 'navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml' or else this error does not occur.

@SteveMacenski
Copy link
Member

https://github.com/ros-planning/navigation2/blob/main/nav2_simple_commander/nav2_simple_commander/robot_navigator.py#L143-L145

The goal message is populated each time - so I don't think its from the Python API unless something very odd is happening - it seems more likely with the BT XML handling or something in BT navigator - since I think you said it happens via rviz too.

@jwallace42
Copy link
Contributor

jwallace42 commented Sep 7, 2022

@automech-rb

Can edit the bringup_launch.py file by switching the log level from info to debug.

Then launch rqt_console by
ros2 run rqt_console rqt_console there is a red icon to start recording the log messages.

Once it is starting to record, run the simulation and reproduce the error. Then stop the recording, save the file and set it to me so I can take a look :).

The other thing you should try so we can eliminate the nav2_simple_commander as an issue is sending the goals manually though the command line.

josh@josh-desktop:~/nav2_ws/src/navigation2$ ros2 action send_goal /navigate_to_pose nav2_msgs/action/NavigateToPose "pose: header:1.0 stamp:0.5 sec: 0 nanosec: 0 frame_id: 'map' pose:: 0.0 position: x: 1.0 y: -0.5 z: 0.0: ''" orientation: x: 0.0 y: 0.0 z: 0.0 w: 1.0 behavior_tree: ''"

If you can reproduce the error then we definitely know that it is something below the nav2_simple commander. I bet @SteveMacenski is right and we will see the error again.

@jwallace42
Copy link
Contributor

Or you can send me your branch :)

@automech-rb
Copy link
Contributor Author

automech-rb commented Sep 7, 2022

Hello @jwallace42
I am not able to share the files due to company policies. But i think you can recreate the scenario easily.

I tested the pr but still leads to same error behaviour. I am also adding the log debug file as you stated.

Setup:

Gazebo + Nav2 (with use_composition) + python script
Using Case 2 (script method):
default_nav_to_pose_bt_xml in param file not set to anything (leaving it as Nav2 default).
Use Simple Commander API (python script in zip) and request any goal using:
-> goToPose(pose1)
-> Then cancel the above navigation using cancelTask() API call.
-> Next use goToPose(pose2, 'navigate_w_recovery_and_replanning_only_if_path_becomes_invalid.xml')

Python script and log debug file

log.zip

In test.py, AGV in the end goes to pose1 instead of pose 2.

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 8, 2022

Not surprised, that PR does something in the same direction as the previous work just resetting the path between runs, but that's not the root cause of why things aren't passing correctly around. Its definitely not bad to do (especially since I love defensive programming + helps solve some future issues with people making bad BTs), but not addressing root cause.

@jwallace42
Copy link
Contributor

jwallace42 commented Sep 8, 2022

Yeah,

I though halt() would be called when the behavior tree completed which was a misunderstanding.

Pretty much, what is happening is the path gets planned. When the task is cancelled the computePathToPose is already completed so the blackboard still contains the old path.

So, when ever a behavior tree ever gets loaded we need to make sure the path is empty. The simplest thing (which I just put in my pr and works, only able to reproduce case2, not case1) is to reset the path every time the behavior tree is loaded.

To clean this up, we would just clear the blackboard on reloading but this causes another issue. There are parameters for the blackboard that get passed down from the bt_navigator to be loaded by the navigator.hpp. As a result, we can't clear all the parameters.

@automech-rb The pr now works if you want to give it a try.

@SteveMacenski
Copy link
Member

so the blackboard still contains the old path ... when ever a behavior tree ever gets loaded we need to make sure the path is empty

Shouldn't a new path be being computed in that BT either way before the controller is called? I'd like to understand the exact reason why a new path isn't available already in that BT when FollowPath BT node is invoked. That to me seems more like the bug -- because even if the path is reset on the blackboard, that means the FollowPath is still being called on an empty path.

@automech-rb
Copy link
Contributor Author

@jwallace42 I tested the PR, it is working now as expected!

@jwallace42
Copy link
Contributor

jwallace42 commented Sep 9, 2022

Just pushed the complete fix to the pr. (got some time to look at the logs closer)

The issue was the globalUpdatedGoal condition node was returning failure on the first cycle instead of success. This would cause the compute path to pose to never get hit and thus leaving the old path on the blackboard. The FollowPath node would then operate on this path.

The reason resetting the path also worked is because isPathValid would fail causing computePathtoPose to trigger. This solution should be good for review :).

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 9, 2022

Got it! Why didn't this cause problems before though when calling the same tree as the default multiple times in a row?

@jwallace42
Copy link
Contributor

jwallace42 commented Sep 9, 2022

So,

first time through planning gets triggered by is path valid. Then, when we send a action to the behavior tree again the tree does not get reloaded so the globalUpdatedGoal works correctly and causes us to replan.

There is quite a few moving peices... So let me know if my explanation is bad :).

@SteveMacenski
Copy link
Member

Ah, I understand, thanks!

@SteveMacenski
Copy link
Member

PR is approved, just needs updates for test failures

@SteveMacenski
Copy link
Member

Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants