Feature/sg 1488 yolo nas r integration model (#2001)

* Added DOTA2 dataset * Added DOTA2 dataset * Add missing type * Added dataset setup instructions * Added extreme batch visualization callback * OBBDetectionMetrics inherit from DetectionMetrics * Added AbstractOBBDataset * Added docs to prepare dataset script * Update docs on visualization callback * Update docs on dataset config file * Model, loss and recipes * Addedd export notebook * Added export support * Added predict script for generation a submission to DOTA test-dev * Added missing imports * Make YoloNASRDFLHead inherit a base YoloNASDFLHead * Make YoloNASRDFLHead inherit a base YoloNASDFLHead * Added explicit mention of TRT support * Added explicit mention of TRT support * Added new recipes to test * Added test to ensure flat_collate_tensors_with_batch_index works correct with bool tensors * Move limitations to the top * Added inheritance
Deci-AI · Jun 4, 2024 · e9e48d5 · e9e48d5
1 parent a39e5c3
commit e9e48d5
Show file tree

Hide file tree

Showing 33 changed files with 4,408 additions and 3 deletions.
diff --git a/LICENSE.YOLONAS-R.md b/LICENSE.YOLONAS-R.md
@@ -0,0 +1,16 @@
+# YOLO-NAS-R License
+
+These model weights or any components comprising the model and the associated documentation (the "Software") is licensed to you by Deci.AI, Inc. ("Deci") under the following terms:
+© 2023 – Deci.AI, Inc. 
+
+Subject to your full compliance with all of the terms herein, Deci hereby grants you a non-exclusive, revocable, non-sublicensable, non-transferable worldwide and limited right and license to use the Software. If you are using the Deci platform for model optimization, your use of the Software is subject to the Terms of Use available here (the "Terms of Use"). 
+
+You shall not, without Deci's prior written consent: 
+(i) resell, lease, sublicense or distribute the Software to any person; 
+(ii) use the Software to provide third parties with managed services or provide remote access to the Software to any person or compete with Deci in any way; 
+(iii) represent that you possess any proprietary interest in the Software; 
+(iv) directly or indirectly, take any action to contest Deci's intellectual property rights or infringe them in any way; 
+(V) reverse-engineer, decompile, disassemble, alter, enhance, improve, add to, delete from, or otherwise modify, or derive (or attempt to derive) the technology or source code underlying any part of the Software; 
+(vi) use the Software (or any part thereof) in any illegal, indecent, misleading, harmful, abusive, harassing and/or disparaging manner or for any such purposes. Except as provided under the terms of any separate agreement between you and Deci, including the Terms of Use to the extent applicable, you may not use the Software for any commercial use, including in connection with any models used in a production environment. 
+
+DECI PROVIDES THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS OF THE SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 
diff --git a/documentation/source/model_zoo.md b/documentation/source/model_zoo.md
@@ -67,6 +67,17 @@ All the available models are listed in the column `Model name`.
 > - Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
 > - Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
 
+### Pretrained Oriented Object Detection Models
+
+| Model        | Model Name     | Dataset | Resolution | mAP<sup>val<br>0.5 | mAP<sup>test<br>0.5 |
+|--------------|----------------|---------|------------|--------------------|---------------------|
+| YOLO-NAS-R S | yolo_nas_r_s   | DOTA 2  | 1024x1024  | 63.424             | 56.56               |
+| YOLO-NAS-R M | yolo_nas_r_m   | DOTA 2  | 1024x1024  | 64.647             | 57.31               |
+| YOLO-NAS-R L | yolo_nas_r_l   | DOTA 2  | 1024x1024  | 66.223             | 59.82               |
+
+> **NOTE:** <br/>
+> Latency for YoloNAS-R should be nearly identical to original YoloNAS due to the fact most layers are exactly the same
+
 ### Pretrained Semantic Segmentation PyTorch Checkpoints
 
 | Model                 | Model Name        | Dataset    | Resolution | mIoU  | Latency b1<sub>T4</sub> | Latency b1<sub>T4</sub> including IO | Latency (Production)**<sub>Jetson Xavier NX</sub> | Torch Compile Support | 

diff --git a/notebooks/YoloNAS_R_Export_to_ONNX.ipynb b/notebooks/YoloNAS_R_Export_to_ONNX.ipynb
diff --git a/src/super_gradients/common/object_names.py b/src/super_gradients/common/object_names.py
@@ -338,6 +338,10 @@ class Models:
     YOLO_NAS_POSE_M = "yolo_nas_pose_m"
     YOLO_NAS_POSE_L = "yolo_nas_pose_l"
 
+    YOLO_NAS_R_S = "yolo_nas_r_s"
+    YOLO_NAS_R_M = "yolo_nas_r_m"
+    YOLO_NAS_R_L = "yolo_nas_r_l"
+
 
 class ConcatenatedTensorFormats:
     XYXY_LABEL = "XYXY_LABEL"
@@ -460,3 +464,7 @@ class Processings:
     SegmentationResize = "SegmentationResize"
     SegmentationPadShortToCropSize = "SegmentationPadShortToCropSize"
     SegmentationPadToDivisible = "SegmentationPadToDivisible"
+    OBBDetectionLongestMaxSizeRescale = "OBBDetectionLongestMaxSizeRescale"
+    OBBDetectionAutoPadding = "OBBDetectionAutoPadding"
+    OBBDetectionCenterPadding = "OBBDetectionCenterPadding"
+    OBBDetectionBottomRightPadding = "OBBDetectionBottomRightPadding"
diff --git a/src/super_gradients/conversion/onnx/obb_nms.py b/src/super_gradients/conversion/onnx/obb_nms.py
@@ -0,0 +1,204 @@
+from typing import Tuple
+
+import torch
+from super_gradients.common.abstractions.abstract_logger import get_logger
+from torch import nn, Tensor
+
+logger = get_logger(__name__)
+
+
+class OBBNMSAndReturnAsBatchedResult(nn.Module):
+    __constants__ = ("batch_size", "confidence_threshold", "iou_threshold", "class_agnostic_nms", "num_pre_nms_predictions", "max_predictions_per_image")
+
+    def __init__(
+        self,
+        confidence_threshold: float,
+        iou_threshold: float,
+        batch_size: int,
+        class_agnostic_nms: bool,
+        num_pre_nms_predictions: int,
+        max_predictions_per_image: int,
+    ):
+        """
+        Perform NMS on the output of the model and return the results in batched format.
+        This module implements MatrixNMS algorithm for rotated bounding boxes.
+        Hence, iou_threshold has different meaning compared to regular NMS.
+
+        :param confidence_threshold:      The confidence threshold to apply to the model output
+        :param iou_threshold:             The IoU threshold for selecting final detections.
+               An iou_threshold has different meaning compared to regular NMS. In matrix NMS, it is the
+               multiplication of predicted confidence score and decay factor for the bounding box (A decay applied to
+               boxes that that has overlap with the current box).
+        :param batch_size:                A fixed batch size for the model
+        :param class_agnostic_nms:        If True, NMS will be class agnostic
+        :param num_pre_nms_predictions:   The number of predictions before NMS step
+        :param max_predictions_per_image: Maximum number of predictions per image
+        """
+        if max_predictions_per_image > num_pre_nms_predictions:
+            raise ValueError(
+                f"max_predictions_per_image ({max_predictions_per_image}) cannot be greater than num_pre_nms_predictions ({num_pre_nms_predictions})"
+            )
+        super().__init__()
+        self.batch_size = batch_size
+        self.class_agnostic_nms = class_agnostic_nms
+        self.confidence_threshold = confidence_threshold
+        self.iou_threshold = iou_threshold
+        self.num_pre_nms_predictions = num_pre_nms_predictions
+        self.max_predictions_per_image = max_predictions_per_image
+
+    def forward(self, input) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
+        """
+        Take decoded predictions from the model, apply NMS to them and return the results in batched format.
+
+        :param pred_boxes:       [B, N, 5] tensor, float32 in CXCYWHR format
+        :param pred_scores:      [B, N, C] tensor, float32 class scores
+        :return:                 A tuple of 4 tensors (num_detections, detection_boxes, detection_scores, detection_classes) will be returned:
+                                 - A tensor of [batch_size, 1] containing the image indices for each detection.
+                                 - A tensor of [batch_size, max_predictions_per_image, 5] containing the bounding box coordinates
+                                   for each detection in [cx, cy, w, h, r] format.
+                                 - A tensor of [batch_size, max_predictions_per_image] containing the confidence scores for each detection.
+                                 - A tensor of [batch_size, max_predictions_per_image] containing the class indices for each detection.
+
+        """
+        from super_gradients.training.models.detection_models.yolo_nas_r.yolo_nas_r_post_prediction_callback import rboxes_matrix_nms
+
+        pred_boxes, pred_scores = input
+        pred_cls_conf, pred_cls_labels = torch.max(pred_scores, dim=2)
+
+        # Apply confidence threshold
+        pred_cls_conf = torch.masked_fill(pred_cls_conf, mask=pred_cls_conf < self.confidence_threshold, value=0)
+        keep = rboxes_matrix_nms(
+            rboxes_cxcywhr=pred_boxes,
+            scores=pred_cls_conf,
+            labels=pred_cls_labels,
+            class_agnostic_nms=self.class_agnostic_nms,
+            iou_threshold=self.iou_threshold,
+            already_sorted=True,
+        )
+        num_predictions = []
+        batched_pred_boxes = []
+        batched_pred_scores = []
+        batched_pred_classes = []
+        for i in range(self.batch_size):
+            keep_i = keep[i]
+            pred_boxes_i = pred_boxes[i][keep_i]
+            pred_scores_i = pred_cls_conf[i][keep_i]
+            pred_classes_i = pred_cls_labels[i][keep_i]
+            num_predictions_i = keep_i.sum()
+
+            pad_size = self.max_predictions_per_image - pred_boxes_i.size(0)
+            pred_boxes_i = torch.nn.functional.pad(pred_boxes_i, [0, 0, 0, pad_size], value=-1, mode="constant")
+            pred_scores_i = torch.nn.functional.pad(pred_scores_i, [0, pad_size], value=-1, mode="constant")
+            pred_classes_i = torch.nn.functional.pad(pred_classes_i, [0, pad_size], value=-1, mode="constant")
+
+            num_predictions.append(num_predictions_i.reshape(1, 1))
+            batched_pred_boxes.append(pred_boxes_i.unsqueeze(0))
+            batched_pred_scores.append(pred_scores_i.unsqueeze(0))
+            batched_pred_classes.append(pred_classes_i.unsqueeze(0))
+
+        num_predictions = torch.cat(num_predictions, dim=0)
+        batched_pred_boxes = torch.cat(batched_pred_boxes, dim=0)
+        batched_pred_scores = torch.cat(batched_pred_scores, dim=0)
+        batched_pred_classes = torch.cat(batched_pred_classes, dim=0)
+
+        return num_predictions, batched_pred_boxes, batched_pred_scores, batched_pred_classes
+
+    def get_output_names(self):
+        return ["num_predictions", "pred_boxes", "pred_scores", "pred_classes"]
+
+    def get_dynamic_axes(self):
+        return {}
+
+
+class OBBNMSAndReturnAsFlatResult(nn.Module):
+    """
+    Select the output from ONNX NMS node and return them in flat format.
+
+    """
+
+    __constants__ = ("iou_threshold", "confidence_threshold", "batch_size", "class_agnostic_nms", "num_pre_nms_predictions", "max_predictions_per_image")
+
+    def __init__(
+        self,
+        confidence_threshold,
+        iou_threshold: float,
+        batch_size: int,
+        class_agnostic_nms: bool,
+        num_pre_nms_predictions: int,
+        max_predictions_per_image: int,
+    ):
+        """
+        Perform NMS on the output of the model and return the results in flat format.
+        This module implements MatrixNMS algorithm for rotated bounding boxes.
+        Hence, iou_threshold has different meaning compared to regular NMS.
+
+        :param confidence_threshold:      The confidence threshold to apply to the model output
+        :param iou_threshold:             The IoU threshold for selecting final detections.
+               An iou_threshold has different meaning compared to regular NMS. In matrix NMS, it is the
+               multiplication of predicted confidence score and decay factor for the bounding box (A decay applied to
+               boxes that that has overlap with the current box).
+        :param batch_size:                A fixed batch size for the model
+        :param class_agnostic_nms:        If True, NMS will be class agnostic
+        :param num_pre_nms_predictions:   The number of predictions before NMS step
+        :param max_predictions_per_image: Maximum number of predictions per image
+        """
+        super().__init__()
+        self.batch_size = batch_size
+        self.class_agnostic_nms = class_agnostic_nms
+        self.confidence_threshold = confidence_threshold
+        self.num_pre_nms_predictions = num_pre_nms_predictions
+        self.max_predictions_per_image = max_predictions_per_image
+        self.iou_threshold = iou_threshold
+
+    def forward(self, input) -> Tensor:
+        """
+        Take decoded predictions from the model, apply NMS to them and return the results in flat format.
+
+        :param pred_boxes:       [B, N, 5] tensor
+        :param pred_scores:      [B, N, C] tensor
+        :return:                 A single tensor of [Nout, 8] shape, where Nout is the total number of detections across all images in the batch.
+                                 Each row will contain [image_index, cx, cy, w, h, r, class confidence, class index] values.
+                                 Each image will have at most max_predictions_per_image detections.
+
+        """
+        from super_gradients.training.models.detection_models.yolo_nas_r.yolo_nas_r_post_prediction_callback import rboxes_matrix_nms
+
+        pred_boxes, pred_scores = input
+        dtype = pred_scores.dtype
+        pred_cls_conf, pred_cls_labels = torch.max(pred_scores, dim=2)
+
+        # Apply confidence threshold
+        pred_cls_conf = torch.masked_fill(pred_cls_conf, mask=pred_cls_conf < self.confidence_threshold, value=0)
+        keep = rboxes_matrix_nms(
+            rboxes_cxcywhr=pred_boxes,
+            scores=pred_cls_conf,
+            labels=pred_cls_labels,
+            class_agnostic_nms=self.class_agnostic_nms,
+            iou_threshold=self.iou_threshold,
+            already_sorted=True,
+        )
+
+        flat_results = []
+        for i in range(self.batch_size):
+            keep_i = keep[i]
+            selected_boxes = pred_boxes[i][keep_i]
+            selected_scores = pred_cls_conf[i][keep_i]
+            label_indexes = pred_cls_labels[i][keep_i]
+            batch_indexes = torch.full_like(label_indexes, i)
+
+            flat_results_i = torch.cat(
+                [batch_indexes.unsqueeze(-1).to(dtype), selected_boxes, selected_scores.unsqueeze(-1), label_indexes.unsqueeze(-1).to(dtype)], dim=1
+            )
+            flat_results_i = flat_results_i[: self.max_predictions_per_image]
+            flat_results.append(flat_results_i)
+
+        flat_results = torch.cat(flat_results, dim=0)
+        return flat_results
+
+    def get_output_names(self):
+        return ["flat_predictions"]
+
+    def get_dynamic_axes(self):
+        return {
+            "flat_predictions": {0: "num_predictions"},
+        }