Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Commit

Permalink
[FEATURE] Working FRCNN for keypoints in new package lightweight_cnn_…
Browse files Browse the repository at this point in the history
…object_localization; completed Torch 1.13.1 cross-compile instructions
  • Loading branch information
eynsai committed Aug 7, 2023
1 parent 964a8fb commit e4cd59e
Show file tree
Hide file tree
Showing 9 changed files with 128 additions and 44 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The training for Mask-RCNN relies on features from modern Torch/TorchVision. How
2. `python3 convert_mrcnn_to_torchscript.py -d "PATH/TO/IMAGES/FOLDER" -w "PATH/TO/CHECKPOINTS/FOLDER/handrail_mrcnn_finetune_ckpt_199.pth"`
3. `python3 test_mrcnn_torchscript.py -d "PATH/TO/IMAGES/FOLDER" -o "PATH/TO/OUTPUTS/FOLDER" -w "PATH/TO/CHECKPOINTS/FOLDER/handrail_mrcnn_finetune_ckpt_199_torchscript.pt"`

# Keypoint-RCNN for handrail detection
# Keypoint-RCNN for handrail detection (EXPERIMENTAL)

This tool also contains scripts for training and testing an experimental keypoint-based version of RCNN. Note that this does not use the actual Keypoint-RCNN architecture, but instead just trains a Mask-RCNN on data containing keypoint masks. To train it on synthetic data and then test it:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,11 @@
5. Click the run button. Once all target objects are done, CTRL+C from the terminal to exit.

NOTE: Due to a weird off-by-one bug that I can't quite eradicate, the first generated set of images (`$OUTPUT_DIR/colored_maps/colored_0000000.png`, `$OUTPUT_DIR/images/image_0000000.png`, and `$OUTPUT_DIR/labels_maps/labels_0000000.png`) do not have ground truth data recorded; nor do they get counted towards `NUM_IMAGES_EACH`. The easiest way to deal with this issue is to just delete these three images after data generation. `generate_data.sh` has an interrupt hook that should handle this, so you probably don't need to do anything assuming Ignition Gazebo quits gracefully (not a foregone conclusion), but be aware.

## To run keypoint annotation (EXPERIMENTAL):

- `./annotate_keypoints.sh -c CONFIG_NAME -o $OUTPUT_DIR`

Note: If you're running the `../python_rcnn_training/src/*_kprcnn.py` scripts, you want to generate data using this tool.
However, if you're working with the newer models in the `lightweight_cnn_object_localization` package, use the data generation tool there, as there have been some updates.

Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
cmake_minimum_required(VERSION 3.0.2)
project(cnn_object_localization)
add_subdirectory(tools/libtorch_mrcnn_test)
project(lightweight_cnn_object_localization)
add_subdirectory(tools/libtorch_frcnn_test/src)
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<?xml version="1.0"?>
<package format="2">
<name>lightweight_cnn_object_localization</name>
<version>0.0.0</version>
<description>The lightweight_cnn_object_localization package</description>

<!-- One maintainer tag required, multiple allowed, one person per tag -->
<!-- Example: -->
<!-- <maintainer email="[email protected]">Jane Doe</maintainer> -->
<maintainer email="[email protected]">anaveen</maintainer>


<!-- One license tag required, multiple allowed, one license per tag -->
<!-- Commonly used license strings: -->
<!-- BSD, MIT, Boost Software License, GPLv2, GPLv3, LGPLv2.1, LGPLv3 -->
<license>TODO</license>


<!-- Url tags are optional, but multiple are allowed, one per tag -->
<!-- Optional attribute type can be: website, bugtracker, or repository -->
<!-- Example: -->
<!-- <url type="website">http://wiki.ros.org/cnn_object_localization</url> -->


<!-- Author tags are optional, multiple are allowed, one per tag -->
<!-- Authors do not have to be maintainers, but could be -->
<!-- Example: -->
<!-- <author email="[email protected]">Jane Doe</author> -->


<!-- The *depend tags are used to specify dependencies -->
<!-- Dependencies can be catkin packages or system dependencies -->
<!-- Examples: -->
<!-- Use depend as a shortcut for packages that are both build and exec dependencies -->
<!-- <depend>roscpp</depend> -->
<!-- Note that this is equivalent to the following: -->
<!-- <build_depend>roscpp</build_depend> -->
<!-- <exec_depend>roscpp</exec_depend> -->
<!-- Use build_depend for packages you need at compile time: -->
<!-- <build_depend>message_generation</build_depend> -->
<!-- Use build_export_depend for packages you need in order to build against this package: -->
<!-- <build_export_depend>message_generation</build_export_depend> -->
<!-- Use buildtool_depend for build tool packages: -->
<!-- <buildtool_depend>catkin</buildtool_depend> -->
<!-- Use exec_depend for packages you need at runtime: -->
<!-- <exec_depend>message_runtime</exec_depend> -->
<!-- Use test_depend for packages you need only for testing: -->
<!-- <test_depend>gtest</test_depend> -->
<!-- Use doc_depend for packages you need only for building documentation: -->
<!-- <doc_depend>doxygen</doc_depend> -->
<buildtool_depend>catkin</buildtool_depend>
<build_depend>message_generation</build_depend>
<exec_depend>message_runtime</exec_depend>


<!-- The export tag contains other, unspecified, tags -->
<export>
<!-- Other tools can request additional information be placed here -->

</export>
</package>
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overview

This tool is intended to test the TorchScript compiled model created using the `pytorch_frcnn_training` tool. Much more importantly, it provides documentation on how to code that depends on `libtorch` and `libtorchvision` can be cross-compiled and installed on Astrobee. The below procedures are not particularly elegant or clean, but they work.
This tool is intended to test the TorchScript compiled model created using the `pytorch_frcnn_training` tool. Much more importantly, it provides documentation on how code that depends on `libtorch` and `libtorchvision` can be cross-compiled and installed on Astrobee. The below procedures are not particularly elegant or clean, but they work.

# Building and running this tool locally

Expand All @@ -18,17 +18,23 @@ This tool is intended to test the TorchScript compiled model created using the `

### Option 2: Build from scratch

- `cd <TORCH_INSTALL_DIR>` where `<TORCH_INSTALL_DIR>` is the directory you want `pytorch` to be downloaded and installed in.
- `cd <TORCH_INSTALL_DIR>` where `<TORCH_INSTALL_DIR>` is the directory you want `pytorch` to be downloaded, built, and installed in.
- `git clone -b v1.13.1 --recurse-submodule https://github.com/pytorch/pytorch.git`
- `cd pytorch; mkdir build_libtorch; cd build_libtorch`
- `export USE_XNNPACK=0; export USE_CUDA=0; export USE_CUDNN=0; export USE_DISTRIBUTED=0; export USE_MKLDNN=0; export BUILD_TEST=0`.
- `USE_XNNPACK=0` because a bug in the Astrobee cross-compile toolchain prevents us from building it when cross-compiling, so we're setting it here too for consistency's sake.
- `USE_CUDA=0` and `USE_CUDNN=0` because the Astrobee doesn't have an NVidia GPU.
- `USE_DISTRIBUTED=0` because we aren't doing any distributed processing stuff.
- `USE_MKLDNN=0` because the Astrobee doesn't have an Intel CPU.
- `BUILD_TEST=0` to save time.
- `python3 ../tools/build_libtorch.py`
- NOTE: At this point, you may get some complaints about python dependencies. Install as needed and try again.
- NOTE: By default, the build script will try to use maximum parallelism. If your machine/VM starts struggling and/or you get an error `c++: fatal error: Killed signal terminated program cc1plus`, `export MAX_JOBS=4` or some other small number, and then try again.
- At this point, the script may complain about not having `typing_extensions`. Simply run `pip3 install typing_extensions` and try again.
- The script will automatically set the number of parallel jobs to the maximum. If your machine/VM starts struggling and/or you get an error `c++: fatal error: Killed signal terminated program cc1plus`, reduce the number of parallel jobs using `export MAX_JOBS=...` and then try again.
- `export CMAKE_PREFIX_PATH=<TORCH_INSTALL_DIR>/pytorch/torch/share/cmake/Torch:${CMAKE_PREFIX_PATH}`. Consider adding this or some equivalent to `~/.bashrc`.

## Step 2: Installing libtorchvision:

- `cd <VISION_DOWNLOAD_DIR>` where `<VISION_DOWNLOAD_DIR>` is the directory you want `vision` to be downloaded, but not installed, in.
- `cd <VISION_DOWNLOAD_DIR>` where `<VISION_DOWNLOAD_DIR>` is the directory you want `vision` to be downloaded and built, but not installed, in.
- `git clone -b v0.14.1 --recurse-submodule https://github.com/pytorch/vision.git`
- `cd vision; mkdir build; cd build`
- `cmake ..`
Expand All @@ -45,13 +51,16 @@ This tool is intended to test the TorchScript compiled model created using the `

## Step 4: Running the test

- `./build/libtorch_mrcnn_test <CNN_OBJECT_LOCALIZATION_RESOURCES_PATH>/checkpoints/handrail_finetune_ckpt_199_torchscript.pt` where `<CNN_OBJECT_LOCALIZATION_RESOURCES_PATH>` is the path to the directory containing your checkpoints folder.
- `./libtorch_frcnn_test <PATH/TO/TORCHSCRIPT/MODEL>`

# Cross-compiling this tool and running on Astrobee

- Make sure you are starting from scratch and don't have residue from previous installations of `libtorch` or `libtorchvision` lying around in your cross-compilation rootfs. This can cause mayhem, especially in the case of `libtorchvision`.
- You will need `sudo` privileges.
- To keep things simple, I choose to build everything in `${ARMHF_CHROOT_DIR}/root`. You can choose somewhere else.

## Step 1: Getting ready

- Remove the `./CATKIN_IGNORE` file. (It is there so that people uninterested in this functionality don't have to deal with Torch/TorchVision dependencies.)
- Ensure that your Astrobee install is set up for cross-compile.
- Download or copy `chroot.sh` from https://babelfish.arc.nasa.gov/bitbucket/projects/ASTROBEE/repos/astrobee_platform/browse/rootfs/chroot.sh.
- Make the following modifications (this makes something wonky but hopefully the subsequent steps still work):
Expand All @@ -73,22 +82,41 @@ This tool is intended to test the TorchScript compiled model created using the `
# chroot "$r" mount -t proc proc /proc
# add_trap chroot "$r" umount /proc
```
- `sudo su; ./chroot.sh $ARMHF_CHROOT_DIR` (this will give you a shell inside the platform)
- `sudo su`
- `./chroot.sh $ARMHF_CHROOT_DIR` (this will give you a shell inside the platform)

## Step 2: Installing libtorch

- Open a separate shell outside of the platform (this is necessary because git isn't installed in the platform, and we need to clone Torch and TorchVision).
- Outside the platform: `cd ${ARMHF_CHROOT_DIR}/root; git clone -b v1.5.0 --recurse-submodule https://github.com/pytorch/pytorch.git`
- Inside the platform: `cd /root/pytorch; python setup.py build`
- See [this documentation](https://github.com/pytorch/pytorch/blob/4ff3872a2099993bf7e8c588f7182f3df777205b/docs/libtorch.rst) for more info.
- Open a separate shell outside of the platform.

### Step 2.1: typing_extensions

- Outside the platform: `cd ${ARMHF_CHROOT_DIR}/root; pip3 install typing_extensions --target typing_extensions`
- Inside the platform: `export PYTHONPATH=/root/typing_extensions:$PYTHONPATH`

### Step 2.2: libtorch

- Outside the platform: `cd ${ARMHF_CHROOT_DIR}/root; git clone -b v1.13.1 --recurse-submodule https://github.com/pytorch/pytorch.git`
- Inside the platform: `cd /root/pytorch; mkdir build_libtorch; cd build_libtorch`
- Inside the platform: `export USE_XNNPACK=0; export USE_CUDA=0; export USE_CUDNN=0; export USE_DISTRIBUTED=0; export USE_MKLDNN=0; export BUILD_TEST=0`
- `USE_XNNPACK=0` because a bug in the toolchain causes it to fail to build. [This bug is fixed in a more up-to-date version of gcc](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101723), but I don't want to poke around the toolchain stuff. If the version of gcc used ever gets upgraded, you can rebuild everything with XNNPACK and probably get a reasonable bump in model inference time in exchange for your efforts.
- `USE_CUDA=0` and `USE_CUDNN=0` because the Astrobee doesn't have an NVidia GPU.
- `USE_DISTRIBUTED=0` because we aren't doing any distributed processing stuff.
- `USE_MKLDNN=0` because the Astrobee doesn't have an Intel CPU.
- `BUILD_TEST=0` to save time.
- Inside the platform: `python3 ../tools/build_libtorch.py`
- The script will attempt to set the number of parallel jobs to the maximum, which it erroneously believes to be 2. You can probably afford to speed up the build process by increasing the number of parallel jobs using `export MAX_JOBS=...`. If your machine/VM starts struggling and/or you get an error `c++: fatal error: Killed signal terminated program cc1plus`, you were too aggressive; back off and try again.

## Step 3: Installing libtorchvision

- Outside the platform: `cd ${ARMHF_CHROOT_DIR}/root; git clone -b v0.6.0 --recurse-submodule https://github.com/pytorch/vision.git`
- Inside the platform: `export CMAKE_PREFIX_PATH=~/pytorch/torch/share/cmake/Torch`
- Inside the platform: `cd /root/vision; mkdir build; cd build; cmake ..; make; make install`
- See [this documentation](https://github.com/pytorch/vision/tree/b68adcf9a9280aef02fc08daed170d74d0892361) for more info.
- Outside the platform: `cd ${ARMHF_CHROOT_DIR}/root`
- Outside the platform: `git clone -b v0.14.1 --recurse-submodule https://github.com/pytorch/vision.git`
- Inside the platform: `cd /root/vision; mkdir build; cd build`
- Inside the platform: `export CMAKE_PREFIX_PATH=~/pytorch/torch/share/cmake/Torch:$CMAKE_PREFIX_PATH`
- Inside the platform: `export USE_XNNPACK=0; export USE_CUDA=0; export USE_CUDNN=0; export USE_DISTRIBUTED=0; export USE_MKLDNN=0; export BUILD_TEST=0`
- Inside the platform: `cmake ..`
- Inside the platform: `make`
- Inside the platform: `make install`

## Step 4: Cross compiling Astrobee

Expand All @@ -98,6 +126,7 @@ This tool is intended to test the TorchScript compiled model created using the `

## Step 5: Cross compiling ISAAC

- Remove the `../../CATKIN_IGNORE` file. (It is there so that people uninterested in this functionality don't have to deal with Torch/TorchVision dependencies.)
- Make the following modifications:
- File: `${ISAAC_WS}/src/scripts/configure.sh`
- Line: 306
Expand All @@ -116,7 +145,7 @@ This tool is intended to test the TorchScript compiled model created using the `
```
- New:
```
--whitelist isaac_astrobee_description isaac_util isaac_msgs inspection cargo isaac_hw_msgs wifi isaac gs_action_helper cnn_object_localization
--whitelist isaac_astrobee_description isaac_util isaac_msgs inspection cargo isaac_hw_msgs wifi isaac gs_action_helper lightweight_cnn_object_localization
```
- Line: 325
- Old:
Expand All @@ -125,24 +154,15 @@ This tool is intended to test the TorchScript compiled model created using the `
```
- New:
```
--whitelist isaac_astrobee_description isaac_util isaac_msgs inspection cargo isaac_hw_msgs wifi isaac gs_action_helper cnn_object_localization
```
- File: `${ARMHF_CHROOT_DIR}/usr/local/share/cmake/TorchVision/TorchVisionTargets.cmake` (file is read-only; you'll need to bypass this)
- Line: 57
- Old:
```
INTERFACE_INCLUDE_DIRECTORIES "/root/vision/"
```
- New:
```
INTERFACE_INCLUDE_DIRECTORIES "${ARMHF_CHROOT_DIR}/usr/local/include"
--whitelist isaac_astrobee_description isaac_util isaac_msgs inspection cargo isaac_hw_msgs wifi isaac gs_action_helper lightweight_cnn_object_localization
```
- `cd $ISAAC_WS`
- `./src/scripts/configure.sh -a; source ~/.bashrc; catkin build`

## Step 6: Installing and running on Astrobee

- `./scripts/prepare_shared_libraries --root=$ARMHF_CHROOT_DIR --output=$ISAAC_WS/armhf/opt/isaac/lib --libs=libc10.so,libtorchvision.so,libtorch_cpu.so,libtorch.so`
- This script finds the specified `.so` files in the root directory, then copies them to the output directory.
- Follow the normal Astrobee installation procedure for ISAAC. The test executable should be located in `opt/isaac/bin`.
- `scp` over your model weights.
- On Astrobee: `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/isaac/lib`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ find_package(Torch REQUIRED)
find_package(TorchVision REQUIRED)
find_package(Python3 COMPONENTS Development)

add_executable(libtorch_frcnn_test src/main.cc)
add_executable(libtorch_frcnn_test main.cc)

target_compile_features(libtorch_frcnn_test PUBLIC cxx_range_for)
target_link_libraries(libtorch_frcnn_test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@
*/

#include <torch/script.h>
// #include <torch/torch.h> // Uncomment if there are any issues with linking...?
// #include <torchvision/vision.h> // Uncomment if there are any issues with linking...?
#include <torchvision/nms.h>
#include <torchvision/vision.h>

#include <iostream>
#include <memory>
Expand All @@ -32,9 +30,6 @@ int main(int argc, const char* argv[]) {
return -1;
}

// This line doesn't do anything, but it makes sure the C++ linker doesn't prune libtorchvision
// vision::cuda_version();

// Load the module
torch::jit::script::Module module;
module = torch::jit::load(argv[1]);
Expand All @@ -48,8 +43,9 @@ int main(int argc, const char* argv[]) {

// Run the model and report runtime
auto t_start = std::chrono::high_resolution_clock::now();
auto output = module.forward(inputOuter).toTuple()->elements()[1];
auto output = module.forward(inputOuter).toTuple()->elements()[1].toListRef()[0];
auto t_end = std::chrono::high_resolution_clock::now();
std::cout << output << "\n";
double elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end-t_start).count();
std::cout << "Module inference ran in " << std::to_string(elapsed_time_ms) << " milliseconds.\n";
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This tool contains scripts for training and testing the Faster-RCNN, as well as
For testing the TorchScript model in a C++ environment, see the `libtorch_frcnn_test` tool.

Training, testing, conversion to TorchScript, and inference in libtorch all use `torch==1.13.1 torchvision==0.14.1`.
This is the most up-to-date version of Torch that can be compiled for the Astrobee platform; newer versions require higher cmake version than supported by Astrobee's platform.
This is the final 1.X version of Torch.
For a full list of Python dependencies for training and testing, see `requirements.txt`.

NOTE: In its current state, this model training code is a quick-and-dirty experiment.
Expand Down
Loading

0 comments on commit e4cd59e

Please sign in to comment.