Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy and speed of yolov4x-mish #6987

Open
Goru1890 opened this issue Nov 17, 2020 · 49 comments
Open

Accuracy and speed of yolov4x-mish #6987

Goru1890 opened this issue Nov 17, 2020 · 49 comments

Comments

@Goru1890
Copy link

Which is the improvement with the new function new_coords over traditional yolov4?
Did someone try it with COCO?


If you do not get an answer for a long time, try to find the answer among Issues with a Solved label: https://github.com/AlexeyAB/darknet/issues?q=is%3Aopen+is%3Aissue+label%3ASolved
@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 17, 2020

  • YOLOv4x-mish - 640x640 - COCO-testdev-2019: 49.4% AP - 67.9% AP50 stdout.txt

    • GPU RTX 2070 - 23 FPS
    • GPU RTX 3090 - 30 FPS
    • GPU V100 - ~50 FPS
  • YOLOv4x-mish - 672x672 - COCO-testdev-2019: 49.6% AP - 68.1% AP50 stdout.txt, GPU RTX 2070 - 21 FPS, GPU V100 - 45 FPS

So currently it is much better than PP-YOLO, EfficientDet, SpineNet and many other models.

Darknet:

Pytorch: https://github.com/WongKinYiu/PyTorch_YOLOv4

overall performance
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.496
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.540
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.307
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.617
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.377
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.656
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.454
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.813
Done (t=835.94s)

@AlexeyAB
Copy link
Owner

@mive93 Hi, Could you port it to tkDNN / TRT please?

@mive93
Copy link

mive93 commented Nov 17, 2020

Hi @AlexeyAB,
sure, I can do that.
What are the main changes wrt yolov4?

@arnaud-nt2i
Copy link

@AlexeyAB Hi !
Is yolov4x-mish ready to train on custom dataset ?

@sctrueew
Copy link

@AlexeyAB Hi,

Does OpenCV-dnn also support?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 17, 2020

@arnaud-nt2i

Is yolov4x-mish ready to train on custom dataset ?

I didn't test it well:

@AlexeyAB
Copy link
Owner

@mive93 Hi,

If there is set [yolo] new_coords=1 then:

  1. We use Logistic (sigmoid) not only for x,y, but for x,y,w,h 8c9c517#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR950

  2. The coordinates should be calculated in this way: 8c9c517#diff-a191a7d286ab1bacf527ae4b5edfbad6951b06a4d80685393577af64eb8e8a8fR141-R144

So in total:

x = (logistic(in) * 2 - 0.5 + grid_x) / grid_width
y = ...
w = pow( logistic(in)*2, 2) * anchor / network_width
h = ...
  1. We use nms=0.6 instead of 0.45

  2. We use diounms() c7e3e2e#diff-2c2b9046564ae9ad1ba54f4b42a3c8acbf98af531e411be6281687f6b6689e98L916

@AlexeyAB
Copy link
Owner

@zpmmehrdad

Does OpenCV-dnn also support?

Currently no, they need the same fixes.

@mive93
Copy link

mive93 commented Nov 18, 2020

@AlexeyAB,

thank you, I will come back to you as soon as I have some results.

@mive93
Copy link

mive93 commented Nov 19, 2020

Hi @AlexeyAB
One question: were the weights computed with new_coords=0?
I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes.
If that is the case, then I have completed the porting and I can push it.

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.463
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.645
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.507
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.305
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.365
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.641
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.463
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.787
Done (t=175.86s)

Then I can test the performance on the Xavier.
My 2080ti is under training, so the performance are a bit degraded, right now I can tell you that FP32 is around 30FPS and FP16 is around 58FPS.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 19, 2020

@mive93

Thanks!

One question: were the weights computed with new_coords=0?

What do you mean?
We have to use all these calculations: #6987 (comment)

I'm asking because when I convert the weights and create the network the output corresponds if new_coords=0, however when I run the demo new_coords should be 1 to have correct boxes.
If that is the case, then I have completed the porting and I can push it.

yolov4x-mish.cfg uses new_coords=1 for all [yolo] layers.
Do you use new_coords=1 too?

The mAP on tkDNN is the following (with thresh=0.001 and COCO_val_2017)

overall performance
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.645

Seems to be it is too small. It should be ~50.0% AP and ~68.5% AP50 for COCO2017-val for yolov4x-mish 672x672

@mive93
Copy link

mive93 commented Nov 20, 2020

Hi @AlexeyAB,

never mind, I solved the export problem.
The issue is that I convert weights and get the debug output for each layer without using the GPU, and new_coords is not implemented for CPU only (maybe you want to change it here https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L374).

I am checking now for the mAP loss.
Will let you know as soon as I solve it.

@AlexeyAB
Copy link
Owner

@mive93 Hi,
Thanks, I fixed it: d18e22a

@duynguyen51
Copy link

@AlexeyAB Hi,
I use YOLOv4x-mish config in my own dataset, but avr loss do not change after over 1000 iteration ( I set batch_size=64). Avg loss remain at 100. How can I fix it ? Can I set new_coords = 0 ? Thanks

@AlexeyAB
Copy link
Owner

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

@duynguyen51
Copy link

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

Thanks, let me check the result after those iter.

@duynguyen51
Copy link

@duynguyen51 Loss doesn't matter. Show mAP after 30% of total iterations. And use subdivisions=16 or lower.

chart_yolo_mish

Hi, this is my mAP after 30% max_iter

@AlexeyAB
Copy link
Owner

@duynguyen51
Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights

[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

@duynguyen51
Copy link

@duynguyen51
Can you set max_delta= for different yolo layers, and restart training from 10 000 iterations? ./darknet detector train ... backup/yolov4x-mish_10000.weights

[yolo]
max_delta=20
...

[yolo]
max_delta=5
...

[yolo]
max_delta=2

Thanks, let me try it.

@AlexeyAB
Copy link
Owner

@duynguyen51
Also set learning_rate=0.001

@AlexeyAB
Copy link
Owner

@duynguyen51 If it doesn't help - try to set and train

[net] 
try_fix_nan=1

@Goru1890
Copy link
Author

Goru1890 commented Nov 23, 2020

How many Gb does your graphic card need to train it? Doesn't work with my nvidia gtx 2070(11 Gb) using 16 subdivisions.

@AlexeyAB
Copy link
Owner

@Goru1890
I can train https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4x-mish.cfg on RTX 3090 - 24 GB VRAM with parameters:

[net]
width=640
height=640
batch=64
subdivisions=8
optimized_memory=1

@arnaud-nt2i
Copy link

@AlexeyAB You said :

  • Make sure that batch=64 and subdivisions <= 16

Is batch 64 really mandatory or it is just to set a mini-batch size minimum (4)
Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

@AlexeyAB
Copy link
Owner

@arnaud-nt2i

Eg: can we set batch=63 and subdivisions=7 or batch=70 subdivisions=7 like in other networks?

Yes, you can. I

@arnaud-nt2i
Copy link

arnaud-nt2i commented Nov 23, 2020

ok thanks, Some other questions:

  • Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?
  • Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio)
    and of corse :
    train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
    train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

@AlexeyAB
Copy link
Owner

Why not using batch normalize=2 it led to good results (+~0.5 mAP) on my own tests (on traditional Yolo V4 mish) ?

batch normalize=2 Sometimes it works better, sometimes worse.

Is letterBox mandatory if (Mean training image ratio) ~= (Network ratio)
and of corse :
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

@toplinuxsir
Copy link

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

@toplinuxsir
Copy link

I train for my custom dataset , when iterations go above 1000, caculate mAP for every iteration , Is that normal ? Thanks
image

@Goru1890
Copy link
Author

Goru1890 commented Nov 24, 2020

No. letter_box=1 is prefered if aspect ratio different for different images and network resolutions.

So if I have in my dataset only images with the same ratio and resolution, may I put letter_box=0 ?

@AlexeyAB
Copy link
Owner

@toplinuxsir I fixed it.

@AlexeyAB
Copy link
Owner

@Goru1890 Yes you can.

@AlexeyAB
Copy link
Owner

@toplinuxsir

Is same with https://github.com/WongKinYiu/ScaledYOLOv4 ?

Yes.
https://arxiv.org/abs/2011.08036

image

@toplinuxsir
Copy link

toplinuxsir commented Nov 25, 2020

@AlexeyAB
I trained for my custom dataset , for yolov4 is normal,
but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0
Is that normal ?
Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy [email protected] = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 % 


@arnaud-nt2i
Copy link

@AlexeyAB
I trained for my custom dataset , for yolov4 is normal,
but for yolov4x-mish , near 2000 iterations , the avg loss is 1339 and mAP is always 0
Is that normal ?
Thanks

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy [email protected] = 0.00 %, best = 0.00 % 
 1986: 1518.558960, 1420.304321 avg loss, 0.001000 rate, 7.550015 seconds, 127104 images, 1987.702662 hours left
Loaded: 4.403241 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.390271), count: 401, total_loss = 2855.301758 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.521089), count: 40, total_loss = 42.997162 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.647995), count: 2, total_loss = 1.097103 
 total_bbox = 9902341, rewritten_bbox = 0.237358 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.407130), count: 383, total_loss = 2931.999512 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543540), count: 38, total_loss = 45.554615 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.617228), count: 4, total_loss = 0.458375 
 total_bbox = 9902766, rewritten_bbox = 0.237348 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.394439), count: 446, total_loss = 3446.613525 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.530023), count: 47, total_loss = 45.623585 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.604356), count: 3, total_loss = 0.951157 
 total_bbox = 9903262, rewritten_bbox = 0.237336 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382783), count: 752, total_loss = 5725.847656 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.483980), count: 57, total_loss = 55.402355 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.632100), count: 4, total_loss = 0.644276 
 total_bbox = 9904075, rewritten_bbox = 0.237367 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.363011), count: 808, total_loss = 5614.538574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.551375), count: 63, total_loss = 66.249748 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.696846), count: 2, total_loss = 1.296881 
 total_bbox = 9904948, rewritten_bbox = 0.237346 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382413), count: 793, total_loss = 5648.040039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.546819), count: 86, total_loss = 109.324928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.663197), count: 3, total_loss = 0.766923 
 total_bbox = 9905830, rewritten_bbox = 0.237375 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397971), count: 605, total_loss = 4430.015137 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.540062), count: 65, total_loss = 80.512238 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.653224), count: 5, total_loss = 2.803994 
 total_bbox = 9906505, rewritten_bbox = 0.237389 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.398305), count: 373, total_loss = 2787.349121 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.558826), count: 40, total_loss = 45.045372 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612750), count: 2, total_loss = 0.871407 
 total_bbox = 9906920, rewritten_bbox = 0.237410 % 

Same but IOU is nan...

@Goru1890
Copy link
Author

Same but IOU is nan...

Same issue...

@arnaud-nt2i
Copy link

Did Someone tried with "try_fix_nan=1" as well ?

@arnaud-nt2i
Copy link

arnaud-nt2i commented Nov 30, 2020

@AlexeyAB One funny thing I have encountered while trying Yolov4x with your very exact parameters on RTX 3090:
Model does not fit into memory "CUDA out of memory" unless I set optimize_memory=0 !!
config:
W10 CUDA-version: 11010 (11010), cuDNN: 8.0.5, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.5.0 Prepare additional network for mAP calculation... 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0

There seems to be a starting spick in memory usage when optimize_memory=1 that makes the model crash.
Even if long-term the memory usage is lower than when optimize_memory=0

@AlexeyAB
Copy link
Owner

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

@arnaud-nt2i
Copy link

arnaud-nt2i commented Dec 7, 2020

@toplinuxsir @Goru1890 @duynguyen51
A lot has been done by AlexeyAB trying to fix yolov4x since our bug reports.
Has somebody tried the latest fix ?

edit: I have found my answer here: WongKinYiu/ScaledYOLOv4#13 (comment)

The last commit seems fine, I will try it and report here.

@Goru1890
Copy link
Author

Goru1890 commented Dec 7, 2020

The last commit seems fine, I will try it and report here.

How did it go?

@toplinuxsir
Copy link

@toplinuxsir
Copy link

@AlexeyAB , I tried the last commit for yolov4 and yolov4x-mish, some strange thing :

  1. Both have higher mAP and higher avg loss
  2. Although have higher mAP ,but both detection results have more missed than before.
    Is that normal ?

@OkuChou
Copy link

OkuChou commented Dec 10, 2020

@arnaud-nt2i Thanks for notice!

Does anyone else have the same problem? So should I set [net] optimized_memory=0 by default?

Yes, i got same error. I also used rtx3090, i can only run properly when optimized_memory=0.
However, try_fix_nan=1 works!
And set last 3 [yolo] layer max_delta=20
After followed your indication, "-nan" error disappeared. The situation so far so good...

@AlexeyAB
Copy link
Owner

@mive93 Hi,
Please, fix tkDNN for yolov4-csp and yolov4x-mish models:
Currently, if there is new_coord=1, then [yolo] shouldn't use logistic (sigomid) activation for any values. Because activation=logistic now is used in the previous convolutional layer:

darknet/cfg/yolov4x-mish.cfg

Lines 1408 to 1436 in e7d029c

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=logistic
[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.1
scale_x_y = 2.0
objectness_smooth=1
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=0.4
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=2

@mive93
Copy link

mive93 commented Jan 4, 2021

Hi @AlexeyAB
Sorry, I saw the comment only now (was submitting my phD thesis and had no time to breath).
I will look into that in the following days.

@mive93
Copy link

mive93 commented Jan 22, 2021

Hi @AlexeyAB,
Scaled yolo4 is now supported, and I have also updated Yolov4x-mish (ceccocats/tkDNN@adac857).

However, I think that in your new implementation of the Yolo layer could have problems with Yolov4.
You have the scale add at the end, but it should not be there (https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L674):

            if (l.new_coords) {
                //activate_array(l.output + bbox_index, 4 * l.w*l.h, LOGISTIC);    // x,y,w,h
            }
            else {
                activate_array(l.output + bbox_index, 2 * l.w*l.h, LOGISTIC);        // x,y,
                int obj_index = entry_index(l, b, n*l.w*l.h, 4);
                activate_array(l.output + obj_index, (1 + l.classes)*l.w*l.h, LOGISTIC);
            }
            scal_add_cpu(2 * l.w*l.h, l.scale_x_y, -0.5*(l.scale_x_y - 1), l.output + bbox_index, 1);    // scale x,y

I think my solution is better (tested with all older models and works for everything) (https://github.com/ceccocats/tkDNN/blob/master/src/Yolo.cpp#L91)

            if (new_coords == 1){
                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);
            }
            else{
                activationLOGISTICForward(srcData + index, dstData + index, 2*dim.w*dim.h);

                if (this->scaleXY != 1) scalAdd(dstData + index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);
                
                index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
                activationLOGISTICForward(srcData + index, dstData + index, (1+classes)*dim.w*dim.h);
            }

@AlexeyAB
Copy link
Owner

@mive93 Hi,
My implementation is equal to this one in your repo (note that I use int bbox_index instead of int index for scalAdd):

  • in this case we will not confuse index with bbox_index
  • if we will want to change x,y-scaling, we should change only 1 line instead of 2 lines
            int bbox_index = entry_index(b, n*dim.w*dim.h, 0, classes, input_dim, output_dim);
            std::cout<<"new_coords"<<new_coords<<std::endl;
            if (new_coords == 1){
                // nothing
            }
            else{
                activationLOGISTICForward(srcData + bbox_index, dstData + bbox_index, 2*dim.w*dim.h);
              
                int obj_cls_index = entry_index(b, n*dim.w*dim.h, 4, classes, input_dim, output_dim);
                activationLOGISTICForward(srcData + obj_cls_index , dstData + obj_cls_index , (1+classes)*dim.w*dim.h);
            }
            if (this->scaleXY != 1) scalAdd(dstData + bbox_index, 2 * dim.w*dim.h, this->scaleXY, -0.5*(this->scaleXY - 1), 1);

@mive93
Copy link

mive93 commented Jan 22, 2021

Yeah you are right, sorry my bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants