Pretrained Convolutional Weights from darknet53 #6

okanlv · 2018-09-05T21:46:52Z

Thanks for sharing your work.
yolov3 initializes model weights (up to line 549 in yolov3.cfg) from darknet53 classifier if I am not mistaken. Your model might not converge at epoch 160 if that is the case. Have you tried initializing yolov3 with darknet53?

glenn-jocher · 2018-09-07T13:07:29Z

There are two training modes:

If -opt.resume = False then train.py will initialize darknet53 with random weights to start training the network from scratch.
If -opt.resume = True then train.py will initialize darknet53 with random weights which are then replaced with the trained weights from a checkpoint (or from the official yolov3 weights).

In both cases it uses yolov3.cfg to initialize darknet. It uses all 788 lines though, why do you say up to line 549?

okanlv · 2018-09-08T12:39:07Z

The author mentioned in section 3 of YOLO9000 that they have trained Darknet-19 for classification on ImageNet 1000 class classification dataset with 224x224 images for 160 epochs. Then, the same network is fine-tuned with 448x448 images for 10 epochs. For the detection task, the last CONV layer of Darknet-19 is removed and some extra layers are added to create YOLO9000 detection architecture. Extra layers are probably initialized with random weights as mentioned in section 2.2 of You Only Look Once: Unified, Real-Time Object Detection.

YOLOV3 uses Darknet-53 instead of Darknet-19 (section 2.4 of YOLOv3). I have assumed that the last layers of Darknet-53 is discarded and the resulting weights are used to initialize YOLOV3 (up to line 549 in yolov3.cfg). Then, some extra layers (randomly initialized) are added to create YOLOV3.

As you have mentioned, -opt.resume = False, initializes all layers with random weights. Because of that, the training might take longer time to converge or might not converge to a good solution. A little disclaimer; I have not read the original C code.

LalitPradhan · 2018-10-02T14:19:38Z

@okanlv , Hi. I too have a similar query.

I have a dataset which is small (1.3K) and significantly different from COCO dataset. I wanted to use the pretrained darknet53.

@glenn-jocher The pretrained darkent53 has weights upto conv_73. Now I did the following:

Initialized all 788 lines with random init weights.
Loaded the weights from darknet53. This way I have pretrained weights till conv_73 (line 549) and randomly initialized weights for the layers after that (in nutshell the 3 YOLO layers).

Now to train, I trained all the layers. Is that incorrect.
a) Should I freeze the pretrained layers or set the lr to a very low value for these pretrained layers and then train the remaining layers with a decent lr?
b) Should I instead use official yolov3.weights as init and train the top few layers (the 3 YOLO ones after line 549)

glenn-jocher · 2018-10-02T14:30:56Z

@LalitPradhan these lines show how to do your option b), transfer learning the pretrained weights. If you uncomment them then all the layers except the 3 YOLO layers are frozen, so only the 3 YOLO layers (which have 650 rows each) will change. You can modify this section accordingly to your needs.

yolov3/train.py

Lines 59 to 62 in 0058431

    
           # # Transfer learning (train only YOLO layers) 
        
           # for i, (name, p) in enumerate(model.named_parameters()): 
        
           #     if p.shape[0] != 650:  # not YOLO layer 
        
           #         p.requires_grad = False

I don't understand your option a). Whats the difference between the 2 pretrained weights? How many layers does each have?

LalitPradhan · 2018-10-02T14:39:02Z

@glenn-jocher If you download https://pjreddie.com/media/files/darknet53.conv.74, This has weights which support the yolo3.cfg file upto line 549 (excluding the YOLO layers) is what I meant. While yolo3 weights has weights for all the layers including the 3 YOLO layers.

And thanks for the transfer learning query. Do I have to comment out any other part of the code if I uncomment the 3 lines under transfer learning comment in your code.

LalitPradhan · 2018-10-02T16:06:53Z

@glenn-jocher ,

I did as you mentioned.

If I don't do a transfer learn I get the following error.
Traceback (most recent call last):
File "train.py", line 198, in
main(opt)
File "train.py", line 138, in main
metrics += model.losses['metrics']
RuntimeError: The expanded size of the tensor (1) must match the existing size (80) at non-singleton dimension 1

I'm guessing there is a mismatch between default COCO classes (80) and my custom classes (1). Can you help me resolve this?

On doing transfer learning I get the following error:
Traceback (most recent call last):
File "train.py", line 198, in
main(opt)
File "train.py", line 71, in main
momentum=.9, weight_decay=5e-4, nesterov=True)
File "/usr/local/lib/python3.6/site-packages/torch/optim/sgd.py", line 64, in init
super(SGD, self).init(params, defaults)
File "/usr/local/lib/python3.6/site-packages/torch/optim/optimizer.py", line 38, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

There is nothing I could figure from this. Can you figure out what might the problem be?

Update: I know the mistake now. In the cfg file i didn't change the num classes and filters in YOLO and conv layer prior to the respective yolo layers.

But now, since I have to train with a different number of class, I think I would have to initialize some of the weights by myself.

okanlv · 2018-10-02T19:25:05Z

@LalitPradhan
I highly recommend you to read Training YOLO on VOC section on https://pjreddie.com/darknet/yolo/. I have not used transfer learning for yolov3 before, so I can only give you suggestions for training from scratch. However, I suggest training all the layers with a lower learning rate instead of just training yolo layers for transfer learning following How transferable are features in deep neural networks?. @glenn-jocher Btw, what are the trainable parameters for yolo layer? If there are some parameters, shouldn't https://pjreddie.com/media/files/yolov3.weights contain yolo parameters as well?

You could use the following steps as a guide to train yolov3 on your dataset:

Darknet53 is trained on ImageNet 1000 class classification dataset. If your dataset is very different from ImageNet (like satellite images), you should probably train Darknet53 from scratch.
You should generate labels for your dataset in yolo3 format by modifying https://pjreddie.com/media/files/voc_label.py. Follow the steps provided on https://pjreddie.com/darknet/yolo/ for VOC dataset to learn how you should present your dataset and its labels. If the code runs successfully, you should see a labels directory containing a .txt file for each image with a line for each ground truth object in the image that looks like:
<object-class> <x> <y> <width> <height>
where x, y, width, and height are relative to the image's width and height. Be sure that x, y, width, and height are not outside of range [0,1]. voc_label.pyshould also generate a .txt file containing the paths for every image in the dataset (or the training set if you do not want to train yolov3 on the whole dataset).
Next, modify https://github.com/ultralytics/yolov3/blob/master/data/coco.names and https://github.com/ultralytics/yolov3/blob/master/cfg/coco.data for your dataset. train should point at .txt file containing the paths for every image in your training set and classes should be equal to number of classes in your dataset.
Now, you should use k-means to calculate the anchor box size for your dataset. You can use https://github.com/Jumabek/darknet_scripts.
Modify anchors and classes terms of yolo layers in https://github.com/ultralytics/yolov3/blob/master/cfg/yolov3.cfg for your dataset. Be careful to sort anchors with respect to their area in ascending order because the first yolo layer detects the biggest 3 anchors (mask = 6,7,8), the second yolo layer detects the next biggest 3 anchors (mask = 3,4,5) and the last yolo layer detects the smallest 3 anchors (mask = 0,1,2).
Load the pretrained Darknet53 weights and initialize weights after conv_73 randomly. Use the same learning rate for all yolov3 layers during the training. In the original yolov3 code, "steps" learning rate policy is used with "burn-in". It is implemented in this repo. You can read issue Darknet Polynomial LR Curve #18 for further information.

LalitPradhan · 2018-10-09T15:22:33Z

@okanlv Thanks for the advice. It sorted my issue out.

BaijuMishra · 2018-11-23T11:09:43Z

Guys, Can you please guide me, How to do transfer learning in Yolov3?

glenn-jocher · 2018-11-23T21:51:21Z

@BaijuMishra if you uncomment these lines and resume training from the official yolov3 weights then only the 3 yolo layers will train:

yolov3/train.py

Lines 66 to 69 in ab9ee6a

    
           # # Transfer learning (train only YOLO layers) 
        
           # for i, (name, p) in enumerate(model.named_parameters()): 
        
           #     if p.shape[0] != 650:  # not YOLO layer 
        
           #         p.requires_grad = False

BaijuMishra · 2018-11-26T06:57:09Z

Hi Glenn, Thank you for the response :)

I have a confusion ?

Do we need delete or change last layers of yolov3.config files?

Regards,
Baiju

glenn-jocher · 2018-11-26T14:09:08Z

@BaijuMishra No, no need to change yolov3.cfg.

alvin-p · 2019-06-11T17:45:27Z

Hello
Thanks a lot for the repo.
I am fairly new to YOLO, so please forgive if the question is not very good.
How long does it take to train Tiny YOLOv3 on the COCO dataset from scratch without pretrained weights? Should it be trained for 100 epochs or 270? I remember that in an older version of your repo it was trained for 270 epochs, now it is 100.

glenn-jocher · 2019-06-11T18:06:57Z

@alvin-p I think we had a misunderstanding of the darknet batch count, so we've corrected down a factor of 4, so 67 epochs would be the nominal training time on COCO.

alvin-p · 2019-06-11T18:14:01Z

Hi, thanks a lot for the quick reply! So the tiny model needs only 68 epochs on full COCO, without using pretrained weights and multiscale training? Do you use then 64 as a batch_size?
Thanks a lot, this helps very much! :)

glenn-jocher · 2019-06-11T18:26:53Z

@alvin-p darknet training is multiscale. I would not advise training without a backbone.

alvin-p · 2019-06-11T18:33:09Z

@glenn-jocher thanks! Are the weights of the backbone also adapted during gradient descent or are they frozen?

glenn-jocher · 2019-06-11T18:36:14Z

@alvin-p all the parameters in the model are modified by the optimizer when training under default settings, including those making up the backbone layers.

alvin-p · 2019-06-11T18:46:57Z

@glenn-jocher Thank you for your time and advice, I really appreciate it :)

glenn-jocher · 2019-08-25T11:40:23Z

@sanazss ah that's interesting. You can read more about backbones here:
AlexeyAB/darknet#3464 (comment)

Their utility is debatable. Can you demonstrate repeatable results on an open source dataset?

duyao-art · 2020-05-14T12:26:36Z

@glenn-jocher 　Sorry, I have a question for the transfer learning. Why yolo.shape[0]=650? I do not understand why it is 650? how is it calculated? Thanks

glenn-jocher · 2020-05-14T16:37:22Z

@duyao-art your question seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:

sudo rm -rf yolov3  # remove existing
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE

Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, PyTorch >= 1.4 etc. You can also use our Google Colab Notebook and our Docker Image to test your code in a working environment.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

okanlv mentioned this issue Oct 30, 2018

Resume training from official yolov3 weights #2

Closed

okanlv closed this as completed Nov 7, 2018

okanlv mentioned this issue Nov 7, 2018

Can u get the mAP as reported in darknet ? #9

Closed

glenn-jocher mentioned this issue Nov 7, 2018

RuntimeError: invalid argument 2: size '[16 x 3 x 15 x 13 x 13]' is invalid for input with 689520 elements at /pytorch/aten/src/TH/THStorage.cpp:84 #33

Closed

YourGc mentioned this issue May 6, 2019

RuntimeError: reduce failed to synchronize: device-side assert triggered #263

Closed

chrisway613 mentioned this issue Apr 3, 2020

Exception with NMS when using gpus #1004

Closed

winnerCR7 mentioned this issue Jul 3, 2020

After interrupting training, load weights/last.pt to continue training #1368

Closed

thibault390 mentioned this issue May 7, 2021

Abandon (core dumped) #1755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained Convolutional Weights from darknet53 #6

Pretrained Convolutional Weights from darknet53 #6

okanlv commented Sep 5, 2018

glenn-jocher commented Sep 7, 2018

okanlv commented Sep 8, 2018

LalitPradhan commented Oct 2, 2018

glenn-jocher commented Oct 2, 2018 •

edited

Loading

LalitPradhan commented Oct 2, 2018 •

edited

Loading

LalitPradhan commented Oct 2, 2018 •

edited

Loading

okanlv commented Oct 2, 2018

LalitPradhan commented Oct 9, 2018

BaijuMishra commented Nov 23, 2018

glenn-jocher commented Nov 23, 2018

BaijuMishra commented Nov 26, 2018

glenn-jocher commented Nov 26, 2018

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019 •

edited

Loading

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019

alvin-p commented Jun 11, 2019

glenn-jocher commented Aug 25, 2019

duyao-art commented May 14, 2020

glenn-jocher commented May 14, 2020 •

edited

Loading

Pretrained Convolutional Weights from darknet53 #6

Pretrained Convolutional Weights from darknet53 #6

Comments

okanlv commented Sep 5, 2018

glenn-jocher commented Sep 7, 2018

okanlv commented Sep 8, 2018

LalitPradhan commented Oct 2, 2018

glenn-jocher commented Oct 2, 2018 • edited Loading

LalitPradhan commented Oct 2, 2018 • edited Loading

LalitPradhan commented Oct 2, 2018 • edited Loading

okanlv commented Oct 2, 2018

LalitPradhan commented Oct 9, 2018

BaijuMishra commented Nov 23, 2018

glenn-jocher commented Nov 23, 2018

BaijuMishra commented Nov 26, 2018

glenn-jocher commented Nov 26, 2018

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019 • edited Loading

alvin-p commented Jun 11, 2019

glenn-jocher commented Jun 11, 2019

alvin-p commented Jun 11, 2019

glenn-jocher commented Aug 25, 2019

duyao-art commented May 14, 2020

glenn-jocher commented May 14, 2020 • edited Loading

glenn-jocher commented Oct 2, 2018 •

edited

Loading

LalitPradhan commented Oct 2, 2018 •

edited

Loading

LalitPradhan commented Oct 2, 2018 •

edited

Loading

glenn-jocher commented Jun 11, 2019 •

edited

Loading

glenn-jocher commented May 14, 2020 •

edited

Loading