Accuracy and speed of yolov4x-mish #6987

Goru1890 · 2020-11-17T08:38:16Z

Which is the improvement with the new function new_coords over traditional yolov4?
Did someone try it with COCO?


If you do not get an answer for a long time, try to find the answer among Issues with a Solved label: https://github.com/AlexeyAB/darknet/issues?q=is%3Aopen+is%3Aissue+label%3ASolved

The text was updated successfully, but these errors were encountered:

AlexeyAB · 2020-11-17T13:22:43Z

YOLOv4x-mish - 640x640 - COCO-testdev-2019: 49.4% AP - 67.9% AP50 stdout.txt
- GPU RTX 2070 - 23 FPS
- GPU RTX 3090 - 30 FPS
- GPU V100 - ~50 FPS
YOLOv4x-mish - 672x672 - COCO-testdev-2019: 49.6% AP - 68.1% AP50 stdout.txt, GPU RTX 2070 - 21 FPS, GPU V100 - 45 FPS

So currently it is much better than PP-YOLO, EfficientDet, SpineNet and many other models.

Darknet:

Pytorch: https://github.com/WongKinYiu/PyTorch_YOLOv4

overall performance
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.496
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.540
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.307
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.617
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.377
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.656
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.454
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.813
Done (t=835.94s)

AlexeyAB · 2020-11-17T13:51:52Z

@mive93 Hi, Could you port it to tkDNN / TRT please?

mive93 · 2020-11-17T14:14:20Z

Hi @AlexeyAB,
sure, I can do that.
What are the main changes wrt yolov4?

arnaud-nt2i · 2020-11-17T15:44:22Z

@AlexeyAB Hi !
Is yolov4x-mish ready to train on custom dataset ?

sctrueew · 2020-11-17T18:54:03Z

@AlexeyAB Hi,

Does OpenCV-dnn also support?

AlexeyAB · 2020-11-17T20:51:48Z

@arnaud-nt2i

Is yolov4x-mish ready to train on custom dataset ?

I didn't test it well:

Try new_coords=1 if there will be bad results then try to train with new_coords=0
Use pre-trained weights https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.conv.166
Make sure that batch=64 and subdivisions <= 16
If you will get Nan - you should set max_delta=20 for each [yolo]-layer, and set learning_rate=0.001 for [net]
And I haven't added exponential moving average (EMA) yet.

AlexeyAB · 2020-11-17T21:15:10Z

@mive93 Hi,

If there is set [yolo] new_coords=1 then:

We use Logistic (sigmoid) not only for x,y, but for x,y,w,h 8c9c517#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR950
The coordinates should be calculated in this way: 8c9c517#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR141-R144

So in total:

x = (logistic(in) * 2 - 0.5 + grid_x) / grid_width
y = ...
w = pow( logistic(in)*2, 2) * anchor / network_width
h = ...

We use nms=0.6 instead of 0.45
We use diounms() c7e3e2e#diff-2c2b9046564ae9ad1ba54f4b42a3c8acbf98af531e411be6281687f6b6689e98L916

AlexeyAB · 2020-11-17T21:17:01Z

@zpmmehrdad

Does OpenCV-dnn also support?

Currently no, they need the same fixes.

mive93 · 2020-11-18T11:37:49Z

@AlexeyAB,

thank you, I will come back to you as soon as I have some results.

mive93 · 2020-11-19T14:02:43Z

Hi @AlexeyAB
One question: were the weights computed with new_coords=0?
I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes.
If that is the case, then I have completed the porting and I can push it.

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.463
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.645
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.507
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.305
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.365
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.641
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.463
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.787
Done (t=175.86s)

Then I can test the performance on the Xavier.
My 2080ti is under training, so the performance are a bit degraded, right now I can tell you that FP32 is around 30FPS and FP16 is around 58FPS.

AlexeyAB · 2020-11-19T14:12:53Z

@mive93

Thanks!

One question: were the weights computed with new_coords=0?

What do you mean?
We have to use all these calculations: #6987 (comment)

I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes.
If that is the case, then I have completed the porting and I can push it.

yolov4x-mish.cfg uses new_coords=1 for all [yolo] layers.
Do you use new_coords=1 too?

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.645

Seems to be it is too small. It should be ~50.0% AP and ~68.5% AP50 for COCO2017-val for yolov4x-mish 672x672

mive93 · 2020-11-20T18:00:22Z

Hi @AlexeyAB,

never mind, I solved the export problem.
The issue is that I convert weights and get the debug output for each layer without using the GPU, and new_coords is not implemented for CPU only (maybe you want to change it here https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L374).

I am checking now for the mAP loss.
Will let you know as soon as I solve it.

AlexeyAB · 2020-11-20T18:15:13Z

@mive93 Hi,
Thanks, I fixed it: d18e22a

duynguyen51 · 2020-11-21T14:08:59Z

@AlexeyAB Hi,
I use YOLOv4x-mish config in my own dataset, but avr loss do not change after over 1000 iteration ( I set batch_size=64). Avg loss remain at 100. How can I fix it ? Can I set new_coords = 0 ? Thanks

AlexeyAB · 2020-11-21T14:29:29Z

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

duynguyen51 · 2020-11-21T14:32:23Z

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

Thanks, let me check the result after those iter.

duynguyen51 · 2020-11-22T10:12:43Z

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

Hi, this is my mAP after 30% max_iter

AlexeyAB · 2020-11-22T14:32:17Z

@duynguyen51
Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights

[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

duynguyen51 · 2020-11-22T14:35:04Z

@duynguyen51
Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights
[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

Thanks, let me try it.

AlexeyAB · 2020-11-22T14:38:17Z

@duynguyen51
Also set learning_rate=0.001

AlexeyAB · 2020-11-23T02:32:59Z

@duynguyen51 If it doesn't help - try to set and train

[net] 
try_fix_nan=1

Goru1890 · 2020-11-23T10:11:40Z

How many Gb does your graphic card need to train it? Doesn't work with my nvidia gtx 2070(11 Gb) using 16 subdivisions.

AlexeyAB · 2020-11-23T20:36:12Z

@Goru1890
I can train https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4x-mish.cfg on RTX 3090 - 24 GB VRAM with parameters:

[net]
width=640
height=640
batch=64
subdivisions=8
optimized_memory=1

arnaud-nt2i · 2020-11-23T23:14:53Z

@AlexeyAB You said :

Make sure that batch=64 and subdivisions <= 16

Is batch 64 really mandatory or it is just to set a mini-batch size minimum (4)
Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

AlexeyAB · 2020-11-23T23:16:58Z

@arnaud-nt2i

Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

Yes, you can. I

arnaud-nt2i · 2020-11-23T23:20:59Z

ok thanks, Some other questions:

Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?
Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio)
and of corse :
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

AlexeyAB · 2020-11-23T23:40:59Z

Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?

batch normalize=2 Sometimes it works better, sometimes worse.

Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio)
and of corse :
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

toplinuxsir · 2020-11-24T05:03:13Z

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

toplinuxsir · 2020-11-24T05:23:58Z

I train for my custom dataset , when iterations go above 1000, caculate mAP for every iteration , Is that normal ? Thanks

Goru1890 · 2020-11-24T07:50:40Z

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

So if I have in my dataset only images with the same ratio and resolution, may I put letter_box=0 ?

AlexeyAB · 2020-11-24T11:27:25Z

@toplinuxsir I fixed it.

AlexeyAB · 2020-11-24T11:27:46Z

@Goru1890 Yes you can.

AlexeyAB · 2020-11-24T11:30:12Z

@toplinuxsir

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

Yes.
https://arxiv.org/abs/2011.08036

toplinuxsir · 2020-11-25T05:45:23Z

@AlexeyAB
I trained for my custom dataset , for yolov4 is normal,
but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0
Is that normal ?
Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy [email protected] = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 %

arnaud-nt2i · 2020-11-25T08:34:34Z

@AlexeyAB
I trained for my custom dataset , for yolov4 is normal,
but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0
Is that normal ?
Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy [email protected] = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 %

Same but IOU is nan...

Goru1890 · 2020-11-26T09:06:31Z

Same but IOU is nan...

Same issue...

arnaud-nt2i · 2020-11-30T14:51:17Z

Did Someone tried with "try_fix_nan=1" as well ?

arnaud-nt2i · 2020-11-30T15:27:03Z

@AlexeyAB One funny thing I have encountered while trying Yolov4x with your very exact parameters on RTX 3090:
Model does not fit into memory "CUDA out of memory" unless I set optimize_memory=0 !!
config:
W10 CUDA-version: 11010 (11010), cuDNN: 8.0.5, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.5.0 Prepare additional network for mAP calculation... 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0

There seems to be a starting spick in memory usage when optimize_memory=1 that makes the model crash.
Even if long-term the memory usage is lower than when optimize_memory=0

AlexeyAB · 2020-11-30T15:36:26Z

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

arnaud-nt2i · 2020-12-07T08:51:13Z

@toplinuxsir @Goru1890 @duynguyen51
A lot has been done by AlexeyAB trying to fix yolov4x since our bug reports.
Has somebody tried the latest fix ?

edit: I have found my answer here: WongKinYiu/ScaledYOLOv4#13 (comment)

The last commit seems fine, I will try it and report here.

Goru1890 · 2020-12-07T17:30:22Z

The last commit seems fine, I will try it and report here.

How did it go?

toplinuxsir · 2020-12-07T23:11:43Z

@arnaud-nt2i
opencv/opencv#18975 (comment)

toplinuxsir · 2020-12-09T06:46:16Z

@AlexeyAB , I tried the last commit for yolov4 and yolov4x-mish, some strange thing :

Both have higher mAP and higher avg loss
Although have higher mAP ,but both detection results have more missed than before.
Is that normal ?

OkuChou · 2020-12-10T14:59:07Z

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

Yes, i got same error. I also used rtx3090, i can only run properly when optimized_memory=0.
However, try_fix_nan=1 works!
And set last 3 [yolo] layer max_delta=20
After followed your indication, "-nan" error disappeared. The situation so far so good...

AlexeyAB · 2020-12-15T02:39:28Z

@mive93 Hi,
Please, fix tkDNN for yolov4-csp and yolov4x-mish models:
Currently, if there is new_coord=1, then [yolo] shouldn't use logistic (sigomid) activation for any values. Because activation=logistic now is used in the previous convolutional layer:

darknet/cfg/yolov4x-mish.cfg

Lines 1408 to 1436 in e7d029c

    
           [convolutional] 
        
           size=1 
        
           stride=1 
        
           pad=1 
        
           filters=255 
        
           activation=logistic 
        
           [yolo] 
        
           mask = 6,7,8 
        
           anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 
        
           classes=80 
        
           num=9 
        
           jitter=.1 
        
           scale_x_y = 2.0 
        
           objectness_smooth=1 
        
           ignore_thresh = .7 
        
           truth_thresh = 1 
        
           #random=1 
        
           resize=1.5 
        
           iou_thresh=0.2 
        
           iou_normalizer=0.05 
        
           cls_normalizer=0.5 
        
           obj_normalizer=0.4 
        
           iou_loss=ciou 
        
           nms_kind=diounms 
        
           beta_nms=0.6 
        
           new_coords=1 
        
           max_delta=2

mive93 · 2021-01-04T15:19:41Z

Hi @AlexeyAB
Sorry, I saw the comment only now (was submitting my phD thesis and had no time to breath).
I will look into that in the following days.

mive93 · 2021-01-22T17:08:07Z

Hi @AlexeyAB,
Scaled yolo4 is now supported, and I have also updated Yolov4x-mish (ceccocats/tkDNN@adac857).

However, I think that in your new implementation of the Yolo layer could have problems with Yolov4.
You have the scale add at the end, but it should not be there (https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L674):

            if (l.new_coords) {
                //activate_array(l.output + bbox_index, 4 * l.w*l.h, LOGISTIC);    // x,y,w,h
            }
            else {
                activate_array(l.output + bbox_index, 2 * l.w*l.h, LOGISTIC);        // x,y,
                int obj_index = entry_index(l, b, n*l.w*l.h, 4);
                activate_array(l.output + obj_index, (1 + l.classes)*l.w*l.h, LOGISTIC);
            }
            scal_add_cpu(2 * l.w*l.h, l.scale_x_y, -0.5*(l.scale_x_y - 1), l.output + bbox_index, 1);    // scale x,y

I think my solution is better (tested with all older models and works for everything) (https://github.com/ceccocats/tkDNN/blob/master/src/Yolo.cpp#L91)

            if (new_coords == 1){
                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);
            }
            else{
                activationLOGISTICForward(srcData + index, dstData + index, 2*dim.w*dim.h);

                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);
                
                index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
                activationLOGISTICForward(srcData + index, dstData + index, (1+classes)*dim.w*dim.h);
            }

AlexeyAB · 2021-01-22T17:27:42Z

@mive93 Hi,
My implementation is equal to this one in your repo (note that I use int bbox_index instead of int index for scalAdd):

in this case we will not confuse index with bbox_index
if we will want to change x,y-scaling, we should change only 1 line instead of 2 lines

            int bbox_index = entry_index(b, n*dim.w*dim.h, 0, classes, input_dim, output_dim);
            std::cout<<"new_coords"<<new_coords<<std::endl;
            if (new_coords == 1){
                // nothing
            }
            else{
                activationLOGISTICForward(srcData + bbox_index, dstData + bbox_index, 2*dim.w*dim.h);
              
                int obj_cls_index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
                activationLOGISTICForward(srcData + obj_cls_index , dstData + obj_cls_index , (1+classes)*dim.w*dim.h);
            }
            if (this->scaleXY != 1) scalAdd(dstData + bbox_index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);

mive93 · 2021-01-22T17:29:28Z

Yeah you are right, sorry my bad.

AlexeyAB added the enhancement label Nov 17, 2020

AlexeyAB mentioned this issue Nov 19, 2020

Export darknet weights to ONNX #7002

Open

lsd1994 mentioned this issue Nov 26, 2020

Scaled-YOLOv4: Scaling Cross Stage Partial Network - in repo? #7027

Open

linghu8812 mentioned this issue Dec 1, 2020

Export yolov4-large models to ONNX and inference with TensorRT WongKinYiu/ScaledYOLOv4#56

Open

arnaud-nt2i mentioned this issue Dec 1, 2020

opencv dnn module suport Scaled yolov4 model? opencv/opencv#18975

Closed

WongKinYiu mentioned this issue Dec 11, 2020

Does ScaledYOLOv4 use the same encoding of bounding box coordinates as YOLOv3 WongKinYiu/ScaledYOLOv4#90

Open

AlexeyAB mentioned this issue Dec 15, 2020

Scaled-YOLOv4: Scaling Cross Stage Partial Network #7087

Open

This was referenced Feb 3, 2021

fix wrong activation function in yolov4-csp.cfg #7327

Closed

Can't reproduce original YOLOv4 AP on coco test-dev2017? WongKinYiu/PyTorch_YOLOv4#262

Closed

bulatnv mentioned this issue Apr 26, 2021

what is the use of new_coords=1 ?? in previous version we have not tried this but in yolov4-csp,yolov4-mish we used this new_coords concept. #7645

Open

kadirnar mentioned this issue Jan 10, 2022

Does it have support on the Scaled-yolov4 model? eriklindernoren/PyTorch-YOLOv3#765

Closed

Accuracy and speed of yolov4x-mish #6987

Accuracy and speed of yolov4x-mish #6987

Comments

Goru1890 commented Nov 17, 2020

AlexeyAB commented Nov 17, 2020 • edited Loading

AlexeyAB commented Nov 17, 2020

mive93 commented Nov 17, 2020

arnaud-nt2i commented Nov 17, 2020

sctrueew commented Nov 17, 2020

AlexeyAB commented Nov 17, 2020 • edited Loading

AlexeyAB commented Nov 17, 2020

AlexeyAB commented Nov 17, 2020

mive93 commented Nov 18, 2020

mive93 commented Nov 19, 2020 • edited Loading

AlexeyAB commented Nov 19, 2020 • edited Loading

mive93 commented Nov 20, 2020

AlexeyAB commented Nov 20, 2020

duynguyen51 commented Nov 21, 2020

AlexeyAB commented Nov 21, 2020

duynguyen51 commented Nov 21, 2020

duynguyen51 commented Nov 22, 2020

AlexeyAB commented Nov 22, 2020

duynguyen51 commented Nov 22, 2020

AlexeyAB commented Nov 22, 2020

AlexeyAB commented Nov 23, 2020

Goru1890 commented Nov 23, 2020 • edited Loading

AlexeyAB commented Nov 23, 2020

arnaud-nt2i commented Nov 23, 2020

AlexeyAB commented Nov 23, 2020

arnaud-nt2i commented Nov 23, 2020 • edited Loading

AlexeyAB commented Nov 23, 2020

toplinuxsir commented Nov 24, 2020

toplinuxsir commented Nov 24, 2020

Goru1890 commented Nov 24, 2020 • edited Loading

AlexeyAB commented Nov 24, 2020

AlexeyAB commented Nov 24, 2020

AlexeyAB commented Nov 24, 2020

toplinuxsir commented Nov 25, 2020 • edited Loading

arnaud-nt2i commented Nov 25, 2020

Goru1890 commented Nov 26, 2020

arnaud-nt2i commented Nov 30, 2020

arnaud-nt2i commented Nov 30, 2020 • edited Loading

AlexeyAB commented Nov 30, 2020

arnaud-nt2i commented Dec 7, 2020 • edited Loading

Goru1890 commented Dec 7, 2020

toplinuxsir commented Dec 7, 2020

toplinuxsir commented Dec 9, 2020

OkuChou commented Dec 10, 2020

AlexeyAB commented Dec 15, 2020

mive93 commented Jan 4, 2021

mive93 commented Jan 22, 2021

AlexeyAB commented Jan 22, 2021

mive93 commented Jan 22, 2021

AlexeyAB commented Nov 17, 2020 •

edited

Loading

AlexeyAB commented Nov 17, 2020 •

edited

Loading

mive93 commented Nov 19, 2020 •

edited

Loading

AlexeyAB commented Nov 19, 2020 •

edited

Loading

Goru1890 commented Nov 23, 2020 •

edited

Loading

arnaud-nt2i commented Nov 23, 2020 •

edited

Loading

Goru1890 commented Nov 24, 2020 •

edited

Loading

toplinuxsir commented Nov 25, 2020 •

edited

Loading

arnaud-nt2i commented Nov 30, 2020 •

edited

Loading

arnaud-nt2i commented Dec 7, 2020 •

edited

Loading