- The PyTorch model is directly converted to a native model built using the TensorRT API, rather than using torch_tensorrt. The converted model can run completely independently of PyTorch.
- Pure Python implementation, seamlessly integrated with the original repository, easy to use with one-click conversion and personalized model configuration files.
- A good example of using torch2trt to convert complex PyTorch models.
- Future updates will support more features.
- In the Encoder section, the msdeformattn operator is implemented as a separate custom CUDA operation, which severely impacts conventional model conversion methods, such as direct conversion to ONNX or TorchScript.
- In the Decoder section, the attn_mask parameter of the native PyTorch nn.multiheadattention() operator does not support 4D tensors that include batches, leading to inconsistencies in input sizes compared to TensorRT.
- Added MSDeformableAttnPlugin as a custom plugin to torch2trt.
- Implemented a multiheadattention in PyTorch that supports batch attn_mask parameters.
- Modified a series of implementations in the model and added numerous custom converter functions for torch2trt to ensure smooth conversion.
- Integrated some post-processing steps without branching if statements into the model to further enhance inference speed.
- The tested version of TensorRT used in this repository is 8.6.1.6.
- The inference image sizes commonly used in the original Mask2former repository are 800 and 1200. On machines with insufficient memory, conversion may lead to out-of-memory errors. It is recommended to adjust the MIN_SIZE_TEST and MAX_SIZE_TEST parameters in cfg.INPUT to modify the model's input size.
- Due to differences in operator implementation, there may be discrepancies in inference results compared to native PyTorch. If you encounter unacceptable discrepancies during use, please raise an issue for specific analysis.
- Do not use this repository for model training.
- Follow the official Mask2former library instructions to complete the installation of the native Mask2former.
- Clone my maintained torch2trt library and compile the newly added MSDeformableAttnPlugin.
git submoudle init
git submoudle update
cd torch2trt
Then change the paths of the TensorRT library and header files in the CMakeLists.txt of torch2trt to your own paths, and then compile and install torch2trt.
python setup.py install
cmake -B build . && cmake --build build --target install && sudo ldconfig
- Download the weights and test images, then run the script as shown below.
cd demo/
python demo_trt.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
--input input1.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/checkpoint_file
-
The configuration file is panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
-
Input image size is 427, 640
-
Interference Speed: On RTX3050 1 batch size
Pytorch2.5 | Tensorrt fp32 |
---|---|
12.25FPS | 20.36FPS |
Support Swin backboneCompleted- Support semantic-segmentation models
- Complete testing and debugging for batch_size > 1
- fp16 int8 quantization
- Convert the mask2former_video model