Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeCamp #43 Support PETR in 1.1 in projects #2175

Merged
merged 33 commits into from
Jan 5, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e820639
rebase
SekiroRong Dec 27, 2022
aca498f
petr3d-
SekiroRong Dec 27, 2022
f5596a5
petr3d-to-petr
SekiroRong Dec 27, 2022
65ff69d
delete_NormalizeMultiviewImage
SekiroRong Dec 27, 2022
06d684b
rename_PETR
SekiroRong Dec 27, 2022
b401698
rename_PETR
SekiroRong Dec 27, 2022
3e6bed0
fix_bug
SekiroRong Dec 27, 2022
7660bc0
fix_bug
SekiroRong Dec 27, 2022
06ab7dc
fix_bug
SekiroRong Dec 27, 2022
2ff66e9
fix_bug
SekiroRong Dec 27, 2022
0e72a91
fix_bug
SekiroRong Dec 27, 2022
2d5209f
fix_bug
SekiroRong Dec 27, 2022
3b5aaa8
fix_bug
SekiroRong Dec 27, 2022
43832c6
revise
SekiroRong Dec 27, 2022
e74655a
remove_builder
SekiroRong Dec 27, 2022
ce970bb
remove_builder
SekiroRong Dec 27, 2022
73bd987
remove_use_external
SekiroRong Dec 27, 2022
d70d1c0
remove_use_external
SekiroRong Dec 27, 2022
24b1c72
remove_PadMultiViewImage
SekiroRong Dec 27, 2022
7851682
remove_PadMultiViewImage
SekiroRong Dec 27, 2022
a46e7d3
remove-AddCamInfo
SekiroRong Dec 28, 2022
56d4821
remove-LidarBox3dVersionTransfrom
SekiroRong Dec 28, 2022
765dbb6
remove-LidarBox3dVersionTransfrom-and-AddCamInfo
SekiroRong Dec 28, 2022
9faa75b
fix__init__
SekiroRong Dec 28, 2022
b504b82
remove-redundent-config
SekiroRong Dec 29, 2022
2d72d17
code-polish
SekiroRong Dec 29, 2022
7f1df2e
remove-builder
SekiroRong Dec 29, 2022
15da09f
remove-builder
SekiroRong Dec 29, 2022
f1be572
remove-redundent-files
SekiroRong Dec 29, 2022
21acc17
replace-forward-train-and-test
SekiroRong Dec 29, 2022
f0af246
remove-redundent__init__
SekiroRong Dec 29, 2022
8c7cd00
remove_petr
SekiroRong Jan 4, 2023
7ee9757
remove-hierarchtecture
SekiroRong Jan 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions projects/PETR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# PETR

This is an README for `PETR`.

## Description

Author: @SekiroRong.
This is an implementation of *PETR*.

## Usage

<!-- For a typical model, this section should contain the commands for training and testing. You are also suggested to dump your environment specification to env.yml by `conda env export > env.yml`. -->

### Training commands

In MMDet3D's root directory, run the following command to train the model:

```bash
python tools/train.py projects/PETR/config/petr/petr_vovnet_gridmask_p4_800x320.py
```

### Testing commands

In MMDet3D's root directory, run the following command to test the model:

```bash
python tools/test.py projects/PETR/config/petr/petr_vovnet_gridmask_p4_800x320.py ${CHECKPOINT_PATH}
```

## Results

<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmdetection3d/edit/dev-1.x/configs/fcos3d/README.md)
You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->

This Result is trained by petr_vovnet_gridmask_p4_800x320.py and use [weights](https://drive.google.com/file/d/1ABI5BoQCkCkP4B0pO5KBJ3Ni0tei0gZi/view?usp=sharing) as pretrain weight.

| Backbone | Lr schd | Mem (GB) | Inf time (fps) | mAP | NDS | Download |
| :----------------------------------------------------------------------------------------------------: | :-----: | :------: | :------------: | :--: | :--: | :----------------------: |
| [petr_vovnet_gridmask_p4_800x320](projects/PETR/configs/petr/petr_vovnet_gridmask_p4_800x320.py) | 1x | 7.62 | 18.7 | 38.3 | 43.5 | [model](<>) \| [log](<>) |

```
mAP: 0.3830
mATE: 0.7547
mASE: 0.2683
mAOE: 0.4948
mAVE: 0.8331
mAAE: 0.2056
NDS: 0.4358
Eval time: 118.7s

Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.567 0.538 0.151 0.086 0.873 0.212
truck 0.341 0.785 0.213 0.113 0.821 0.234
bus 0.426 0.766 0.201 0.128 1.813 0.343
trailer 0.216 1.116 0.227 0.649 0.640 0.122
construction_vehicle 0.093 1.118 0.483 1.292 0.217 0.330
pedestrian 0.453 0.685 0.293 0.644 0.535 0.238
motorcycle 0.374 0.700 0.253 0.624 1.291 0.154
bicycle 0.345 0.622 0.262 0.775 0.475 0.011
traffic_cone 0.539 0.557 0.319 nan nan nan
barrier 0.476 0.661 0.279 0.142 nan nan
```
281 changes: 281 additions & 0 deletions projects/PETR/config/petr/petr_r50dcn_gridmask_c5.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
_base_ = [
'../../../../configs/_base_/datasets/nus-3d.py',
'../../../../configs/_base_/default_runtime.py'
]
backbone_norm_cfg = dict(type='LN', requires_grad=True)

# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
SekiroRong marked this conversation as resolved.
Show resolved Hide resolved
voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False)
model = dict(
type='Petr3D',
SekiroRong marked this conversation as resolved.
Show resolved Hide resolved
use_grid_mask=True,
img_backbone=dict(
type='ResNet',
SekiroRong marked this conversation as resolved.
Show resolved Hide resolved
depth=50,
num_stages=4,
out_indices=(3, ),
frozen_stages=-1,
norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True,
style='caffe',
with_cp=True,
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True),
pretrained='ckpts/resnet50_msra-5891d200.pth',
),
pts_bbox_head=dict(
type='PETRHead',
num_classes=10,
in_channels=2048,
num_query=900,
LID=True,
with_position=True,
with_multiview=True,
position_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
normedlinear=False,
transformer=dict(
type='PETRTransformer',
decoder=dict(
type='PETRTransformerDecoder',
return_intermediate=True,
num_layers=6,
transformerlayers=dict(
type='PETRTransformerDecoderLayer',
attn_cfgs=[
dict(
type='MultiheadAttention',
embed_dims=256,
num_heads=8,
dropout=0.1),
dict(
type='PETRMultiheadAttention',
embed_dims=256,
num_heads=8,
dropout=0.1),
],
feedforward_channels=2048,
ffn_dropout=0.1,
with_cp=True,
operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
'ffn', 'norm')),
)),
bbox_coder=dict(
type='NMSFreeCoder',
# type='NMSFreeClsCoder',
post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
pc_range=point_cloud_range,
max_num=300,
voxel_size=voxel_size,
num_classes=10),
positional_encoding=dict(
type='SinePositionalEncoding3D', num_feats=128, normalize=True),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=2.0),
loss_bbox=dict(type='L1Loss', loss_weight=0.25),
loss_iou=dict(type='GIoULoss', loss_weight=0.0)),
# model training and testing settings
train_cfg=dict(
pts=dict(
grid_size=[512, 512, 1],
voxel_size=voxel_size,
point_cloud_range=point_cloud_range,
out_size_factor=4,
assigner=dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(
type='IoUCost', weight=0.0
), # Fake cost. Just to be compatible with DETR head.
pc_range=point_cloud_range))))

dataset_type = 'PETRNuScenesDataset'
data_root = 'data/nuscenes/'

file_client_args = dict(backend='disk')

db_sampler = dict(
data_root=data_root,
info_path=data_root + 'nuscenes_dbinfos_train.pkl',
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(
car=5,
truck=5,
bus=5,
trailer=5,
construction_vehicle=5,
traffic_cone=5,
barrier=5,
motorcycle=5,
bicycle=5,
pedestrian=5)),
classes=class_names,
sample_groups=dict(
car=2,
truck=3,
construction_vehicle=7,
bus=4,
trailer=6,
barrier=2,
motorcycle=6,
bicycle=6,
pedestrian=2,
traffic_cone=2),
points_loader=dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=[0, 1, 2, 3, 4],
file_client_args=file_client_args))
ida_aug_conf = {
'resize_lim': (0.8, 1.0),
'final_dim': (512, 1408),
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900,
'W': 1600,
'rand_flip': True,
}
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='LidarBox3dVersionTransfrom'),
dict(
type='ResizeCropFlipImage', data_aug_conf=ida_aug_conf, training=True),
dict(
type='GlobalRotScaleTransImage',
rot_range=[-0.3925, 0.3925],
translation_std=[0, 0, 0],
scale_ratio_range=[0.95, 1.05],
reverse_angle=False,
training=True),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(
type='ResizeCropFlipImage', data_aug_conf=ida_aug_conf,
training=False),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img'])
])
]

data = dict(
samples_per_gpu=1,
workers_per_gpu=4,
train=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'nuscenes_infos_train.pkl',
pipeline=train_pipeline,
classes=class_names,
modality=input_modality,
test_mode=False,
use_valid_flag=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='LiDAR'),
val=dict(
type=dataset_type,
pipeline=test_pipeline,
classes=class_names,
modality=input_modality),
test=dict(
type=dataset_type,
pipeline=test_pipeline,
classes=class_names,
modality=input_modality))

optimizer = dict(
type='AdamW',
lr=2e-4,
paramwise_cfg=dict(custom_keys={
'img_backbone': dict(lr_mult=0.1),
}),
weight_decay=0.01)

optimizer_config = dict(
type='Fp16OptimizerHook',
loss_scale=512.,
grad_clip=dict(max_norm=35, norm_type=2))

# learning policy
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
min_lr_ratio=1e-3,
# by_epoch=False
)
total_epochs = 24
evaluation = dict(interval=24, pipeline=test_pipeline)
find_unused_parameters = False

runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
load_from = None
resume_from = None

# mAP: 0.3050
# mATE: 0.8504
# mASE: 0.2813
# mAOE: 0.6539
# mAVE: 1.0381
# mAAE: 0.2438
# NDS: 0.3496
# Eval time: 313.1s

# Per-class results:
# Object Class AP ATE ASE AOE AVE AAE
# car 0.500 0.608 0.156 0.122 1.091 0.243
# truck 0.259 0.876 0.239 0.203 0.975 0.264
# bus 0.300 0.912 0.222 0.197 2.383 0.446
# trailer 0.104 1.169 0.253 0.630 0.480 0.070
# construction_vehicle 0.051 1.161 0.495 1.266 0.126 0.399
# pedestrian 0.395 0.755 0.298 1.132 0.854 0.340
# motorcycle 0.293 0.749 0.269 0.983 1.929 0.157
# bicycle 0.274 0.777 0.271 1.191 0.467 0.031
# traffic_cone 0.482 0.649 0.331 nan nan nan
# barrier 0.392 0.847 0.280 0.162 nan nan
Loading