Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaled-YOLOv4: Scaling Cross Stage Partial Network #7087

Open
AlexeyAB opened this issue Dec 7, 2020 · 46 comments
Open

Scaled-YOLOv4: Scaling Cross Stage Partial Network #7087

AlexeyAB opened this issue Dec 7, 2020 · 46 comments

Comments

@AlexeyAB
Copy link
Owner

AlexeyAB commented Dec 7, 2020

Scaled-YOLOv4: Scaling Cross Stage Partial Network - The best neural network for object detection (Top1 accuracy on MS COCO dataset)

Scaled YOLO v4 is the most accurate neural network (55.8% AP Microsoft COCO) among neural network published. In addition, it is the best in terms of the ratio of speed to accuracy in the entire range of accuracy and speed from 15 FPS to 1774 FPS. We show that YOLO and Cross-Stage-Partial (CSP) Network approaches are the best in terms of both absolute accuracy and accuracy-to-speed ratio.

Models:


For Training (yolov4-csp.cfg, yolov4x-mish.cfg, yolov4-p5.cfg, yolov4-p6.cfg) - change these lines before each of 3 for p5 (of 4 for p6) [yolo]-layers:

darknet/cfg/yolov4-p5.cfg

Lines 1810 to 1811 in 9a86fce

filters=340
activation=logistic

filters=<(5 + num_classes) x 4>
activation=logistic - for training and detection by using Darknet: https://github.com/AlexeyAB/darknet
activation=linear - for training and detection by using Pytorch Scaled-YOLOv4 (CSP-branch): https://github.com/WongKinYiu/ScaledYOLOv4/tree/yolov4-csp

For training use pre-trained weights:

Currently Pytorch is more suitable for training on multiple-GPUs.


scaled_yolov4_res - копия - копия

@AlexeyAB AlexeyAB pinned this issue Dec 7, 2020
@MKiremitci
Copy link

@AlexeyAB Great job sir. When I have a time, I will read the paper. Can we train a custom model with Scaled-YOLOv4?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 7, 2020

Can we train a custom model with Scaled-YOLOv4?

Yes, sure, you can. Just use the latest version of Darknet.

@MKiremitci
Copy link

Thanks :) I will try it

@marvision-ai
Copy link

marvision-ai commented Dec 7, 2020

@AlexeyAB Very interesting! Great progress. Just to confirm:

  1. Did you also update the yolov4-tiny model to be scaled-yolov4-tiny? Or is it just the regular one?
    -> If yes, is there any 3layer version of it?
  2. Does opencv support the new scaled networks?

@tdurand
Copy link

tdurand commented Dec 8, 2020

@AlexeyAB congrats ! same Q as @marvision-ai , did the yolov4-tiny model weights changed or is it the same ?

@philipp-schmidt
Copy link

Impressive work, congrats Aleksey @AlexeyAB, to you and the authors.
Just a thought of mine after reading the paper: while yolov4-tiny has an insane performance, yolov4-tiny-3l seems to be in a very nice "sweet spot", with a significant boost in accuracy over tiny (actually just a few percent shy of some of the original yolov3 variants if I can see that correctly, which is incredible if you think about how that's where it all started) for a still very impressive performance.
Is there any chance you could share pretrained weights for yolov4-tiny-3l with us so we can play around with that? I think this variant of the network really deserves more credit.

@marvision-ai
Copy link

@YashasSamaga Hello, does opencv support the new networks?

@marvision-ai
Copy link

marvision-ai commented Dec 14, 2020

@AlexeyAB any comments please?
@tdurand I believe there isn't any change since the cfg hasn't changed.

@AlexeyAB
Copy link
Owner Author

I have added an explanation of the necessary fixes, so we are waiting for the fix:

@AlexeyAB
Copy link
Owner Author

did the yolov4-tiny model weights changed or is it the same ?

Yes. I uploaded new weights files yolov4-tiny.weights and yolov4-tiny.conv.29 .
In general cfg-file is the same, just it was trained 2 000 000 iterations instead of 500 000 iterations.

@marvision-ai
Copy link

did the yolov4-tiny model weights changed or is it the same ?

Yes. I uploaded new weights files yolov4-tiny.weights and yolov4-tiny.conv.29 .
In general cfg-file is the same, just it was trained 2 000 000 iterations instead of 500 000 iterations.

I assume I can use this model with tiny-3l-spp cfg? SPP is still useful for this one correct?

@philipp-schmidt
Copy link

philipp-schmidt commented Dec 15, 2020

@AlexeyAB could we have weights and updated cfg for yolov4-tiny-3l.cfg as well?

Paper reports 28.7% AP (+6% to tiny) with same FPS on TX2.

@AlexeyAB
Copy link
Owner Author

I assume I can use this model with tiny-3l-spp cfg? SPP is still useful for this one correct?

You can use this weight to train your custom tiny-3l-spp: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

@marvision-ai
Copy link

marvision-ai commented Dec 15, 2020

I assume I can use this model with tiny-3l-spp cfg? SPP is still useful for this one correct?

You can use this weight to train your custom tiny-3l-spp: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

@AlexeyAB Thank you.
I see there is new activations along with new features in the yolo layers. Could we have a new tiny-3l-spp cfg to train with the new weights? Thank you!

@scianand
Copy link

Which cfg file I have to use for Scaled-YOLOv4?

@bulatnv
Copy link

bulatnv commented Dec 22, 2020

Hello, AlexeyAB.
There seems to be an error in the mask for yolov4-tiny.

mask = 1,2,3

Shouldn't it be mask=0,1,2?

@Hyvenos
Copy link

Hyvenos commented Dec 23, 2020

First of all, thanks you for you great work.
I just tried the models you provided (yolov4x-mish and yolov4-csp) to compare with the original yoloV4 on one image.
I used the latest darknet update of this repo to run my tests.

For yolov4-csp and yolov4x-mish I used the cfg and weights files you gave on the first post. For the original v4 weights and original v4 config: I used the ones you provided like two months ago.
Here are the (visual) results:

Original V4 512
yolov4x-mish 512
yolov4x-mish 640
yolov4-csp 512

Two things jumped out at me:

  • For csp and mish bounding boxes are more accurate, which was attented
  • csp and mish missed a lot of detections compared to the original v4 and it seems to be related to the object size.
    AlexAB, I found a post of you saying that:
    ・ YOLOv4x-mish - should be trained longer. Also if you use 416x416 or 512x512, then try to use anchors from yolov4.cfg.
    Which implies that the anchors are not the same between original v4 and scaled-v4.
    However looking at the cfg file: I exactly have the sames.
    I wonder if the small objects not detected aren't related to an anchor issue?

A last thing in which I'm less interested is that the original v4 detect the whole crowd as a person but mish and csp did not (which is suitable for my point of view)

@AlexeyAB
Copy link
Owner Author

@Hyvenos Just try to use lower confidence threshold for csp and x-mish models, -thresh 0.15

@Hyvenos
Copy link

Hyvenos commented Dec 24, 2020

Hi tried it, without improvements unfortunatelly.
The images I linked above where processed with a confidence threshold of 0.25.
I tried again with threshold 15, 10 and 5, with the same config files and weights I used in my previous post, the only thing is that I didn't test is mish in 640x640, I just test it in 512x512

Here are the results
Confidence threshold 15
Original v4
yolov'x-mish
yolov4-csp
Confidence threshold 10
Original v4
yolov4x-mish
yolov4-csp
Confidence threshold 5
Original v4
yolov4x-mish
yolov4-csp

Even when lowering the threshold, csp and mish struggle to detect the tinyest objects, and when doing they output the wrong category.
Here are the anchors:
For csp and mish: anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
Can you confirm these are suitable in most of the cases?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 24, 2020

@Hyvenos
In general, compared to yolov4 with the same resolution, the model yolov4-csp :

  • increases AP (bbox coordinates) ~ +4%
  • increases AP50 (number of detected objects) ~ +0.4%
    • increases AP50 for large objects, but decreases AP50 for small objects

So:

  • or use yolov4-csp or yolov4x-mish with much higher resolution than yolov4
  • or use yolov4

If you want to detect small objects using new models csp/x-mish, then its better to fine-tune yolov4x-mish.cfg/weights on COCO or your dataset by using Darknet with these modifications for each [yolo] layer in yolov4x-mish.cfg

  • objectness_smooth=0
  • scale_x_y = 1.05
  • remove iou_loss=ciou and iou_normalizer=0.07
  • as a last resort: remove new_coords=1 (and change activation=logistic to activation=linear in [convolutional] layers before [yolo] layers)

Also if you want to detect objects in crowds, then train with iou_thresh=1.0 instead of iou_thresh=0.2

@GeekAlexis
Copy link

@AlexeyAB Can you elaborate a bit more on why it is better to set iou_thresh=1.0 to detect objects in crowds? Thanks in advance.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Jan 4, 2021

@GeekAlexis

  • iou_thresh=1.0 - it will use only 1 the best anchor with the highest IoU for each ground truth. So other anchors can be used for other ground truth in the same cell. So there can be 3 objects with differrent shapes in 1 cell.

  • iou_thresh=0.2 - solves blinking issue by matching all anchors with IoU>0.2 to 1 ground truth. It increases rewritten_boxes (value that you can see during training), so if there are 3 ground truth in 1 cell, then they will share the same 3 anchors (conflict) - all 3 anchors will be trained for 1 bbox (average between 3 ground truths).

@stephanecharette
Copy link
Collaborator

@AlexeyAB This is the kind of thing that should be documented in the wiki instead of an issue. This way people looking up details on parms can find it much easier than browsing through all the issues.

@GeekAlexis
Copy link

GeekAlexis commented Jan 4, 2021

@AlexeyAB Thanks for the clarification. To train for both small and large objects, is it still recommended to use the aforementioned modifications with yolov4-csp?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Jan 5, 2021

yolov4-tiny don't use iou_thresh. There is very low capacity in the yolov4-tiny to train 3 instead of 1 anchor per 1 objects.

@manoj-8246
Copy link

@AlexeyAB I am trying to test the scaled model on a video with below command-
./darknet.exe detector demo cfg/coco.data cfg/yolov4-csp.cfg yolov4-csp.weights sample.mp4 -out_filename output_scaled.mp4

But I am getting following message and demo screen is hanged-
OpenCV: FFMPEG: tag 0x58564944/'DIVX' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
scaled_issue_video

The same command with Yolov4 cfg and weights files works without any issue. Is there any change required for scaled version of cfg and weight files?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Jan 6, 2021

@manoj-8246 This is not related to YOLO. This is related to OpenCV and your video file, better to ask it there https://answers.opencv.org/questions/

@haileykwak
Copy link

@AlexeyAB

  1. What is the difference between yolov4-csp and yolov4x-mish ?

  2. and I trained darknet "yolov4-csp-416" on a custom dataset. But, Loss is very high. I checked dataset files using -show_imgs and cfg file. mAP is ok. what's the problem about loss?

@RasmusToivanen
Copy link

Thanks again @AlexeyAB for your great work.

If I understand the graph correctly there is no yolov4-tiny versions in the graph.
Could those be added to the graph for easier comparison against other versions.

I would also like to see "previous" yolov4-tiny models and their variants compared to scaled-yolov4 variants.
I got like 13 fps on my latest try with the "original" yolov4-tiny models on jetson nano and hopefully this weekend can get the scaled-versions running with the new model and hopefully can get the 39fps or close as the repo claims.

@AlexeyAB
Copy link
Owner Author

@RasmusToivanen There is the same yolov4-tiny.cfg model in the Darknet and Scaled version https://github.com/WongKinYiu/ScaledYOLOv4/tree/yolov4-tiny
Just there is available new pre-trained weights with AP=22%, AP50 = 42% (green dot on the graph) https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights

  • Small model YOLOv4-tiny isn't suitable for high-end GPU Tesla V100. YOLOv4-tiny is for Jetson / Mobile platforms.
  • Scaled versions of YOLOv4-CSP, P5-P7 have only +1% higher AP50 (detect the same number of objects), but has +4% higher AP (detect more accurate box coordinates).

There is comparison fo AP50 for: YOLOv4-CSP vs YOLOv4 vs YOLOv4-tiny
101363015-e5c21200-38b1-11eb-986f-b3e516e05977

@sharoseali
Copy link

@Hyvenos
In general, compared to yolov4 with the same resolution, the model yolov4-csp :

  • increases AP (bbox coordinates) ~ +4%

  • increases AP50 (number of detected objects) ~ +0.4%

    • increases AP50 for large objects, but decreases AP50 for small objects

So:

  • or use yolov4-csp or yolov4x-mish with much higher resolution than yolov4
  • or use yolov4

If you want to detect small objects using new models csp/x-mish, then its better to fine-tune yolov4x-mish.cfg/weights on COCO or your dataset by using Darknet with these modifications for each [yolo] layer in yolov4x-mish.cfg

  • objectness_smooth=0
  • scale_x_y = 1.05
  • remove iou_loss=ciou and iou_normalizer=0.07
  • as a last resort: remove new_coords=1 (and change activation=logistic to activation=linear in [convolutional] layers before [yolo] layers)

Also if you want to detect objects in crowset stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989ds, then train with iou_thresh=1.0 instead of iou_thresh=0.2

HI @AlexeyAB, For improving the detection of small objects on yolov4, you mentioned 3 changes for normal Yolov4 cfg file on the readme section i-e:
for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of

layers = 54

set stride=4 instead of
stride=2

set stride=4 instead of
stride=2

I tried it in past for yolov4.cfg, it results in some better outcomes, but when I did some same changes in yolov4-CSP.cfg, it's giving an error while loading network at training time, seems I did a mistake .... input shape mismatch at some layer, Can you please suggest, if we want to make same changes for Scaled yolov4 (CSP) then what exactly changes I have to do? Thanks in advance.

@xjinai
Copy link

xjinai commented Feb 17, 2021

@AlexeyAB: Thank you for adding the support of Scaled-YOLOv4 in this repo. Great job!

I've read the Scaled-YOLOv4 paper. I'm interested in trying YOLOv4-P5, YOLOv4-P6 and YOLOv4-P7 on my custom dataset. By reading your notes above, there are models such as YOLOv4-CSP, YOLOv4x-MISH, and YOLOv4-CSPx-P7 (1536x1536). I have a couple of questions that I hope you can help:

  1. Do you also have models for YOLOv4-P5 (896x896) and YOLOv4-P6 (1280x1280)? If not, is it possible to add those?

  2. I am assuming that YOLOv4-CSPx-P7 (1536x1536) in your repo is the same as the YOLOv4-P7 in the paper. Can you please confirm? Can I train the YOLOv4-CSPx-P7 model using my custom dataset?

  3. What are the differences between YOLOv4-CSP and YOLOv4x-MISH in your repo? Which one implements the "YOLOv4-CSP" in the paper?

I look forward to your reply. Thank you in advance for any information you may provide.

@SpongeBab
Copy link

@AlexeyAB: hi,
i use the Scaled-yolov4 to train my dataset,and i get the pt weights
how can i use it on darknet?

@akashAD98
Copy link

akashAD98 commented Apr 21, 2021

@Hyvenos
In general, compared to yolov4 with the same resolution, the model yolov4-csp :

  • increases AP (bbox coordinates) ~ +4%

  • increases AP50 (number of detected objects) ~ +0.4%

    • increases AP50 for large objects, but decreases AP50 for small objects

So:

  • or use yolov4-csp or yolov4x-mish with much higher resolution than yolov4
  • or use yolov4

If you want to detect small objects using new models csp/x-mish, then its better to fine-tune yolov4x-mish.cfg/weights on COCO or your dataset by using Darknet with these modifications for each [yolo] layer in yolov4x-mish.cfg

  • objectness_smooth=0
  • scale_x_y = 1.05
  • remove iou_loss=ciou and iou_normalizer=0.07
  • as a last resort: remove new_coords=1 (and change activation=logistic to activation=linear in [convolutional] layers before [yolo] layers)

Also if you want to detect objects in crowds, then train with iou_thresh=1.0 instead of iou_thresh=0.2

SO as you said remove iou_loss=ciou , should I need to replace this loss function with any other loss??

for this readme file for detecting small objects you haven't removed 'iou_loss=ciou' https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L1153

I had one question for detecting small + big objects for custom data is there any other parameters we need to consider ?? Thanks in advance. @WongKinYiu @AlexeyAB

@AlexeyAB
Copy link
Owner Author

There is fixed a bug WongKinYiu/ScaledYOLOv4#89 in Scaled-YOLOv4-CSP yolov4-csp branch for Pytorch: https://github.com/WongKinYiu/ScaledYOLOv4/tree/yolov4-csp#yolov4-csp

And performance is improved (47.8 AP -> 48.7 AP).

@grandprixgp
Copy link

@Hyvenos
In general, compared to yolov4 with the same resolution, the model yolov4-csp :

* increases AP (bbox coordinates) ~ +4%

* increases AP50 (number of detected objects) ~ +0.4%
  
  * increases AP50 for large objects, but **decreases AP50 for small objects**

So:

* or use yolov4-csp or yolov4x-mish with much higher resolution than yolov4

* or use yolov4

If you want to detect small objects using new models csp/x-mish, then its better to fine-tune yolov4x-mish.cfg/weights on COCO or your dataset by using Darknet with these modifications for each [yolo] layer in yolov4x-mish.cfg

* `objectness_smooth=0`

* `scale_x_y = 1.05`

* remove `iou_loss=ciou` and `iou_normalizer=0.07`

* as a last resort: remove `new_coords=1` (and change `activation=logistic` to `activation=linear` in [convolutional] layers before [yolo] layers)

Also if you want to detect objects in crowds, then train with iou_thresh=1.0 instead of iou_thresh=0.2

I hope you don't mind me asking, but what would your recommended modifications be to v4-tiny-3l that we've already modified to fit our custom data. We are streaming video and we need the low latency / high FPS hence the need for tiny, but we would like to improve:

  1. Detection of small objects (at long distances we are facing struggles).
  2. Flickering/blinking, which I think is due to detecting only one object at a time when multiple are present.

Our AP and AP50 values are very high, consistency and accuracy are through the roof outside of these cases (long distance and multiple objects).

Unfortunately I see most of what you've advised for small objects / multiple objects and some of the comments you've made in this thread don't apply to tiny, especially tiny-3l.

@AlexeyAB
Copy link
Owner Author

I added new models: #7414

  • yolov4-p5.cfg/weights
  • yolov4-p6.cfg/weights

arrufat added a commit to arrufat/dlib that referenced this issue Jul 6, 2021
This allows changing the behavior of the network while training.
Setting it to 1 (disable) will have the standard YOLO behavior:
match only the best anchor with a truth.  However, setting it to
a lower value will make any anchor with an IoU > iou_anchor_threshold
with a ground truth be updated as well.
See this thread for more details:
AlexeyAB/darknet#7087
@corentin87
Copy link

Hi everyone,

Looking at the cfg files, I can see some differences between yolov4-csp and yolov4-mish. But I can't really picture the changes. What is the main idea behind the mish compare to csp?
Thank you

@WongKinYiu
Copy link
Collaborator

main idea:
YOLOv4 = CSPDarknet + PAN
YOLOv4-CSP = CSPDarknet + CSPPAN

@corentin87
Copy link

main idea:
YOLOv4 = CSPDarknet + PAN
YOLOv4-CSP = CSPDarknet + CSPPAN

Thanks but I'm asking about yolov4-csp vs yolov4-mish, not yolov4 vs yolov4-csp.

@WongKinYiu
Copy link
Collaborator

YOLOv4 = CSPDarknet(Mish) + PAN(LReLU)
YOLOv4-Mish = CSPDarknet(Mish) + PAN(Mish)
YOLOv4-CSP = CSPDarknet(Mish) + CSPPAN(Mish)

@corentin87
Copy link

Thank you!

@a0917bc
Copy link

a0917bc commented Aug 18, 2021

@AlexeyAB could we have weights and updated cfg for yolov4-tiny-3l.cfg as well?

Paper reports 28.7% AP (+6% to tiny) with same FPS on TX2.

Hello, where is the paper?
Thank you!

@yusiyoh
Copy link

yusiyoh commented Aug 20, 2021

@Hyvenos
In general, compared to yolov4 with the same resolution, the model yolov4-csp :

  • increases AP (bbox coordinates) ~ +4%

  • increases AP50 (number of detected objects) ~ +0.4%

    • increases AP50 for large objects, but decreases AP50 for small objects

So:

  • or use yolov4-csp or yolov4x-mish with much higher resolution than yolov4
  • or use yolov4

If you want to detect small objects using new models csp/x-mish, then its better to fine-tune yolov4x-mish.cfg/weights on COCO or your dataset by using Darknet with these modifications for each [yolo] layer in yolov4x-mish.cfg

  • objectness_smooth=0
  • scale_x_y = 1.05
  • remove iou_loss=ciou and iou_normalizer=0.07
  • as a last resort: remove new_coords=1 (and change activation=logistic to activation=linear in [convolutional] layers before [yolo] layers)

Also if you want to detect objects in crowset stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989ds, then train with iou_thresh=1.0 instead of iou_thresh=0.2

HI @AlexeyAB, For improving the detection of small objects on yolov4, you mentioned 3 changes for normal Yolov4 cfg file on the readme section i-e:
for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of

layers = 54

set stride=4 instead of

stride=2

set stride=4 instead of

stride=2

I tried it in past for yolov4.cfg, it results in some better outcomes, but when I did some same changes in yolov4-CSP.cfg, it's giving an error while loading network at training time, seems I did a mistake .... input shape mismatch at some layer, Can you please suggest, if we want to make same changes for Scaled yolov4 (CSP) then what exactly changes I have to do? Thanks in advance.

Hello, did you find the solution to your question? I am in the same situation, but using pytorch.

@1319303003
Copy link

Dear AlexeyAB, is the pre training models of scale-yolov4 models, such as yolov4-p5 and yolov4-tiny, trained in Imagenet2012? And, in your scale-YOLOv4 paper, you said that did not use ImageNet pre-trained models. So, I want to know what dataset do you use in pre training models of yolov4-tiny? Hope to get your letter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests