Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy-matching Trajectory Cache Injectable Traits refactor 🔥🔥 #2941

Merged
merged 82 commits into from
Jan 4, 2025

Conversation

methylDragon
Copy link
Contributor

@methylDragon methylDragon commented Jul 31, 2024

As requested in the original PR, this PR refactors the TrajectoryCache to allow users to inject their own behaviors (which will allow them to cache on and sort by any arbitrary feature, as long as it can be represented as a series of warehouse_ros columns).

Depends on:

Builds on top of:

TODOs:

  • Fix integration-test
  • Formatting pass (will be done after review)
  • Fix tutorials
  • Fix bugs

Preamble

I apologize that this PR is even larger than the one it builds on top of. Most of the added lines are docstrings or tests, and boilerplate to support the behavior injection pattern this refactor is concerned with.

On the other hand, the average file length has decreased, so the code is MUCH more modular and hopefully easy to digest.

I can't split up this PR into multiple smaller ones since technically speaking, in order to preserve cache functionality, all the feature extractors and cache insert policies introduced in this PR will need to go in together.

I would suggest looking at the tests to aid in review (they run the gamut of unit and integration tests).

You can also build and run the example:

PS: If the size is still prohibitive, and we are okay with having partial implementations live in moveit2 while reviews are pending, let me know and I will split up the PR into smaller PRs (though I suspect at that point, that a logical number of splits might end up being somewhere close to 5-10 PRs.)

Navigating This PR

Since this PR is so large, here is a proposed order for comprehending the PR.

  • Fully read this PR description and the example code in the description
  • Build and run the demo to convince yourself that the cache works (instructions in that PR)
  • Look at the new interfaces introduced
    • features/features_interface.hpp, cache_insert_policies/cache_insert_policy_interface.hpp
  • Then look at their implementations and tests
    • features/, cache_insert_policies/
  • Then look at the main TrajectoryCache class
    • trajectory_cache.hpp, trajectory_cache.cpp
  • Then tie it all together by looking at the example usage of the classes in this PR in the demo code in Add Trajectory Cache Example For Refactor moveit2_tutorials#940

Additionally, all docstrings are filled, including file ones, as appropriate. So hopefully any clarificatory questions would have already been pre-empted and answered.

Description

This PR builds on top of the fuzzy-matching Trajectory Cache introduced in:

The implementation in that PR coupled the cache tightly with assumptions about what features to extract and sort by (i.e., a specific set of start/goal constraints, and pruning by execution time.)

This means that users who might want to store different features or a subset of those features, or who might want to fetch and prune on different features (e.g., minimum jerk, path length, etc.) will be unable to.

This PR refactors the cache to allow users to inject their own feature extractors and pruning/insertion logic!

This is done by introducing two new abstract classes that can be injected into the cache methods, acting a lot like class "traits":

  • FeaturesInterface<>: Governs what query/metadata items to extract and append to the warehouse_ros query.
  • CacheInserterInterface<>: Governs the pruning and insertion logic.

For more details on FeaturesInterface, see the Docstrings: https://github.com/moveit/moveit2/blob/cc0feb37cf423076e133523ccdbbf3038b84a01e/moveit_ros/trajectory_cache/include/moveit/trajectory_cache/features/features_interface.hpp

Some notes:

  • I decided to not go with lambdas, because there should be tight coupling between the query and metadata insertion logic for a particular feature.
  • Similarly, cache insertion logic heavily benefits from being stateful, and coupling those chunks of logic together.

Example

In other words, before. the cache was used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(*move_group_interface, robot_name, motion_plan_req_msg,
                                            _cache_start_match_tolerance, _cache_goal_match_tolerance,
                                            /*sort_by=*/"execution_time_s", /*ascending=*/true);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  traj_cache->insertTrajectory(
      *interface, robot_name, std::move(plan_req_msg), std::move(res->result.trajectory),
      rclcpp::Duration(res->result.trajectory.joint_trajectory.points.back().time_from_start).seconds(),
      res->result.planning_time, /*delete_worse_trajectories=*/true);
}

Now the cache is used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>> features;
features.emplace_back(std::make_unique<WorkspaceFeatures>());
features.emplace_back(std::make_unique<StartStateJointStateFeatures>(start_tolerance));
// ...

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(move_group, robot_name, motion_plan_req_msg,
                                            /*features=*/features,
                                            /*sort_by=*/TrajectoryCache::getDefaultSortFeature(),
                                            /*ascending=*/true);

// Or more simply, if you want the same feature set as before the refactor, instead of painfully listing the features one by one:
// Type: std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>>
auto default_features = TrajectoryCache::getDefaultFeatures(_cache_start_match_tolerance, _cache_goal_match_tolerance);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  //
  // NOTE: Now instead of passing a trajectory, pass the plan result,
  // it'll contain the execution time and planning time we need!
  //
  // cache_inserter is a CacheInserterInterface<MotionPlanRequest, MotionPlanResponse, msg::RobotTrajectory>
  // It will tell the trajectory cache:
  //   - how to fetch "matching entries"
  //   - how to determine if they should be pruned,
  //   - how to determine when to insert the candidate cache entry
  //   - and what metadata to attach 
  //
  // additional_features allows a user to further add more metadata features for use with fetching
  // though they will not be considered by the cache_inserter
  traj_cache->insertTrajectory(
      move_group, robot_name, std::move(plan_req_msg), std::move(plan),
      /*cache_inserter=*/BestSeenExecutionTimePolicy(),
      /*prune_worse_trajectories=*/true, /*additional_features=*/{});
}

See the motion plan request features here: 79b7f95

The Feature Contract

Users may use an instance of FeaturesInterface<> to fetch a cache entry only if it was supported by the CacheInserterInterface<> instance that they used (or on insert, the feature was added in additional_features).

I could not think of a way to create a coupling between uses of the cache inserters and the features. This is the cost of generality and allowing users to inject arbitrary logic into the cache.

As such, users must take care to look at the docs of the cache inserter to see what features can be used with them.

(This can be mitigated by adding helper methods to get "standard" bundles of features and a "standard" CacheInserter.)

Bonus

I added new features to the default feature extractors set and cleaned up some utilities!

There are now FeaturesInterface<> implementations that can handle path and trajectory constraints!
Multiple goal constraints are also handled (at the cost of increased cardinality.)

I also added "atomic" features that wrap the basic ops you can do with warehouse_ros, to allow users to specify their own metadata to tag cache entries with.

Here: cc0feb3

Pre-Existing Support

The package now provides some starter implementations that covers most general cases of motion planning.

For more information, see the implementations of:

  • FeaturesInterface
  • CacheInsertPolicyInterface

Cache Keying Features

The following are features of the plan request and response that you can key the cache on.

These support separately configurable fuzzy lookup on start and goal conditions!
Additionally, these features "canonicalize" the inputs to reduce the cardinality of the cache, increasing the chances of cache hits. (e.g., restating poses relative to the planning frame).

Supported Features:

  • "Start"
    • WorkspaceFeatures: Planning workspace
    • StartStateJointStateFeatures: Starting robot joint state
  • "Goal"
    • MaxSpeedAndAccelerationFeatures: Max velocity, acceleration, and cartesian speed limits
    • GoalConstraintsFeatures: Planning request goal_constraints
      • This includes ALL joint, position, and orientation constraints (but not constraint regions)!
    • PathConstraintsFeatures: Planning request path_constraints
    • TrajectoryConstraintsFeatures: Planning request trajectory_constraints

Additionally, support for user-specified features are provided for query-only or cache metadata tagging constant features.

Similar support exists for the cartesian variants of these.

Cache Insert and Pruning Policies

The following are cache insertion and pruning policies to govern when cache entries are inserted, and how they are (optionally) pruned.

Supported Cache Insert Policies:

  • BestSeenExecutionTimePolicy: Only insert best seen execution time, optionally prune on best execution time.
  • AlwaysInsertNeverPrunePolicy: Always insert, never prune

Caveat

The increased functionality is now no longer 100% covered. But I tried adding tests where I had time to. I am unfortunately running out of time to iterate on this, so let's be targeted with the improvements!

Build/Test

source /opt/ros/rolling/setup.bash
sudo rosdep init
rosdep update

mkdir -p traj_cache_ws/src
cd traj_cache_ws/src
git clone https://github.com/methyldragon/moveit2.git -b ch3/trajectory-cache-refactor
for repo in moveit2/moveit2.repos $(f="moveit2/moveit2_rolling.repos"; test -r $f && echo $f); do vcs import < "$repo"; done

cd ..
rosdep install --from-paths src --ignore-src --rosdistro rolling -y -r
colcon build --packages-up-to moveit_ros_trajectory_cache --cmake-args -DCMAKE_BUILD_TYPE=Release

source install/setup.bash
colcon test --packages-select moveit_ros_trajectory_cache

Precommit with:

pre-commit run --all

@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch 12 times, most recently from 91b48e8 to cc0feb3 Compare August 1, 2024 05:50
@methylDragon methylDragon changed the title (DO NOT MERGE) Ch3/trajectory cache refactor (WIP) Pluggable Fuzzy Matching TrajectoryCache refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Pluggable Fuzzy Matching TrajectoryCache refactor (WIP) Pluggable fuzzy-matching Trajectory Cache refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Pluggable fuzzy-matching Trajectory Cache refactor (WIP) Fuzzy-matching Trajectory Cache Traits refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Fuzzy-matching Trajectory Cache Traits refactor (WIP) Fuzzy-matching Trajectory Cache Injectable Traits refactor Aug 1, 2024
@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch from 79b7f95 to e64cad8 Compare August 1, 2024 07:29
@methylDragon
Copy link
Contributor Author

methylDragon commented Aug 1, 2024

Quick question: clang-format/tidy is erroneously editing the template parameters.

How do I get around it?
Related bug: llvm/llvm-project#46097

Is it acceptable to throw in NOLINT directives?

@stephanie-eng
Copy link
Contributor

You'd be far from the first use-case of NOLINT, but would recommend judicious use of it.

@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch 9 times, most recently from 77789f3 to 1feb211 Compare August 3, 2024 10:11
sea-bass
sea-bass previously approved these changes Dec 29, 2024
@sea-bass
Copy link
Contributor

sea-bass commented Dec 29, 2024

Latest commit seems to look good -- I am rerunning CI a few times just to confirm. Thanks for working on this, @methylDragon!

@sea-bass
Copy link
Contributor

sea-bass commented Dec 29, 2024

Ah, the 2nd run of CI failed in the rolling/clang-tidy job, actually.

I'd be a bit wary of merging with these flaky tests. Any ideas on how to improve this behavior? And what do you think was merged upstream to be causing this?

Maybe #3143?

@sea-bass sea-bass dismissed their stale review December 29, 2024 13:48

flaky tests

@methylDragon
Copy link
Contributor Author

methylDragon commented Dec 29, 2024

Ah, the 2nd run of CI failed in the rolling/clang-tidy job, actually.

I'd be a bit wary of merging with these flaky tests. Any ideas on how to improve this behavior? And what do you think was merged upstream to be causing this?

Maybe #3143?

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/

The test this PR introduces basically checks if the nodes involved with the launchfile all terminated nicely, and also had the appropriate number of PASS printouts. (No way around it otherwise since move_group requires the robot node and ros2_control to work)

Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

@sea-bass
Copy link
Contributor

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/

Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

Those ros2_control segfaults in integration tests are unfortunately false alarms in all these tests -- there is an actual assertion failing here: https://github.com/moveit/moveit2/actions/runs/12533008564/job/34958234846?pr=2941#step:11:10051

    [test_best_seen_execution_time_policy_with_move_group-8] /home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_ros/trajectory_cache/test/cache_insert_policies/test_best_seen_execution_time_policy_with_move_group.cpp:170: Failure
    [test_best_seen_execution_time_policy_with_move_group-8] Value of: ret
    [test_best_seen_execution_time_policy_with_move_group-8]   Actual: false
    [test_best_seen_execution_time_policy_with_move_group-8] Expected: true
    [test_best_seen_execution_time_policy_with_move_group-8] BestSeenExecutionTimePolicy: Empty joint trajectory points.
    [test_best_seen_execution_time_policy_with_move_group-8] 
    [test_best_seen_execution_time_policy_with_move_group-8] [  FAILED  ] MoveGroupFixture.BestSeenExecutionTimePolicyWorks (5209 ms)

@methylDragon
Copy link
Contributor Author

methylDragon commented Dec 30, 2024

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/
Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

Those ros2_control segfaults in integration tests are unfortunately false alarms in all these tests -- there is an actual assertion failing here: https://github.com/moveit/moveit2/actions/runs/12533008564/job/34958234846?pr=2941#step:11:10051

    [test_best_seen_execution_time_policy_with_move_group-8] /home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_ros/trajectory_cache/test/cache_insert_policies/test_best_seen_execution_time_policy_with_move_group.cpp:170: Failure
    [test_best_seen_execution_time_policy_with_move_group-8] Value of: ret
    [test_best_seen_execution_time_policy_with_move_group-8]   Actual: false
    [test_best_seen_execution_time_policy_with_move_group-8] Expected: true
    [test_best_seen_execution_time_policy_with_move_group-8] BestSeenExecutionTimePolicy: Empty joint trajectory points.
    [test_best_seen_execution_time_policy_with_move_group-8] 
    [test_best_seen_execution_time_policy_with_move_group-8] [  FAILED  ] MoveGroupFixture.BestSeenExecutionTimePolicyWorks (5209 ms)

Ah! I see, I think this is an error on my part while writing the tests.
Let's try with 6b19476

Almost to the finish line! Thanks for the prompt reviews 🙇

Signed-off-by: methylDragon <[email protected]>
@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch from 6b19476 to 0c4d4d5 Compare December 30, 2024 04:51
@sea-bass
Copy link
Contributor

Seems the lingering flaky failures are now based on exit codes. You could consider trying something like this:

a401e13

@methylDragon
Copy link
Contributor Author

Seems the lingering flaky failures are now based on exit codes. You could consider trying something like this:

a401e13

Hmm, okay. Less ideal since it feels like papering over the issue, but okay since technically the segfaults are unrelated. 03e0dc4

@methylDragon methylDragon requested a review from sjahr December 30, 2024 12:59
@sea-bass
Copy link
Contributor

Aw man, latest run has a real failure in the test again...

@methylDragon
Copy link
Contributor Author

Oh man, let me continue investigating. Sorry for the repeated back and forths..

@methylDragon
Copy link
Contributor Author

methylDragon commented Dec 30, 2024

d78fd22 is the only thing I can think of, otherwise I will need to const define any trajectories in the test (which will be quite wordy)

EDIT: Looks like the CI failure is a different flaky test now (pilz). Unfortunately my local setup makes it difficult to replicate CI, but I did run it a couple times and didn't see the same failure we saw before.

Copy link
Contributor

@sea-bass sea-bass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine now -- in 5 tries, only got another exit code related failure on a Pilz test, which is for sure not linked to this PR.

Thanks for all the hard work!

@methylDragon
Copy link
Contributor Author

methylDragon commented Dec 31, 2024

Seems fine now -- in 5 tries, only got another exit code related failure on a Pilz test, which is for sure not linked to this PR.

Thanks for all the hard work!

Thanks for the in-depth reviews! They helped test stability and fixing some minor bugs in the code.

After this is merged the only thing left for this entire feature is the tutorial:

Pinging @sjahr for re-approval (from stale review) and merge!

@sea-bass
Copy link
Contributor

After this is merged the only thing left for this entire feature is the tutorial:

Note that we've since branched Jazzy off of main and are only landing bug fixes; however, in the tutorials repo the main branch is currently for both Jazzy and Rolling. So we can either:

  • Add a disclaimer to your specific tutorial to use rolling/main from source
  • Branch a Jazzy branch of the tutorials / build a separate docs page for Jazzy (the right, but more annoying thing to do)

Pinging @sjahr for re-approval (from stale review) and merge!

Sebastian, feel free to merge when/if you approve after the updates.

@methylDragon
Copy link
Contributor Author

  • Add a disclaimer to your specific tutorial to use rolling/main from source
  • Branch a Jazzy branch of the tutorials / build a separate docs page for Jazzy (the right, but more annoying thing to do)

Not sure how to do the latter, so I'm just gonna do the former 😬

@sea-bass
Copy link
Contributor

I meant more that the maintainers could help do the branching if this will be the jump-off point for the tutorials as well. But I'm still on the fence since this feature is very much at the "leaf" level of MoveIt.

Copy link
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methylDragon @sea-bass Thanks for your work to bring this over the finish line ❤️

@sjahr sjahr enabled auto-merge January 3, 2025 14:31
@sjahr sjahr added this pull request to the merge queue Jan 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 3, 2025
@sea-bass sea-bass added this pull request to the merge queue Jan 4, 2025
Merged via the queue into moveit:main with commit a12f327 Jan 4, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants