Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No ouput boxes after training !! #80

Open
khorchefB opened this issue Mar 9, 2017 · 76 comments
Open

No ouput boxes after training !! #80

khorchefB opened this issue Mar 9, 2017 · 76 comments

Comments

@khorchefB
Copy link

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg.
So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

  • epochs = 100
  • batch = 16
  • learning rate = 1e-5

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

@thtrieu
Copy link
Owner

thtrieu commented Mar 11, 2017

I see you are doing YOLOv2. How much is the loss? I suspect yours has not converged.

@hyzcn
Copy link

hyzcn commented Mar 11, 2017

@thtrieu Hi! I also train a two class YOLO v2 on my dataset, which has around 50000 images. I use the same setting as @KamelBouyacoub , and I trained with the pre-trained imagenet weights download from darknet website. At first, the loss decrease rapidly in around 10 epoches, then it stays around 1.8 ~ 2 and didn't decrease any more, the learning rate at 1e-6 for those epoches. I wonder how long it usually takes to converge? what's a normal loss like can give meaningful output? Could you kindly give some reasons or improvement suggestions? Thx!

@khorchefB
Copy link
Author

I retrained my graph for a second time, and here what it display to me after 800 iteration with learning rate equal to 1e-2

capture

Please, I need help, can you give me some advice for training.

Thanks

@thtrieu
Copy link
Owner

thtrieu commented Mar 13, 2017

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg.
So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command
./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

If you want to work with 2 labels, then there are two modifications have to be made in .cfg: [region].classes = 2 and the last convolutional layer's filter number (should be 35 instead of 425).

Make sure you did the above, then please avoid training right away. First, train on a very small dataset (3~5 images) of both classes. Only when you successfully overfit this small dataset (an inexpensive end-to-end test for the whole system), then move on to training on your whole dataset.

If overfitting fails, I'll help you look into the details.

@hyzcn
Copy link

hyzcn commented Mar 13, 2017

@thtrieu Hi! I'm another poster with similar issues as mentioned in previous posts. I already change my class number to 2 classes and try to overfit the net with around 8 images, the loss can converge a bit lower but then it still get stuck around 1.6. I wonder these 3-5 images you mentioned is randomly drawn or there is any guidelines? Moreover, the loss of successful overfitting is around 0? Or any magnitude to indicate successful overfitting? I have been trapped for a few days and thanks in advance for your reply!!

@thtrieu
Copy link
Owner

thtrieu commented Mar 13, 2017

In my experiments, the overfitting loss can be around or smaller than 0.1. In the case of disabling noise augmentation, it can very well be near perfect 0.0.

3-5 images can be anything (randomly drawn from training set is possible), but preferably contains all of your classes (e.g. car and dogs, then 3-5 images should better have both of them instead of only one). Not being able to overfit such a small training set means the learning rate are too big; or there is bug in the code.

I recommend disabling noise augmentation during this overfit step by setting argument allobj = None in https://github.com/thtrieu/darkflow/blob/master/net/yolo/data.py#L69, setting learning rate smaller (say 1e-5) and try overfitting again.

@hyzcn
Copy link

hyzcn commented Mar 13, 2017

@thtrieu thanks for the information, I'll try on that! 👍

@andreapiso
Copy link

I am retraining yolov2 on VOC 2012 with 20 classes and did not change any parameter. Loss is now at 0.01 and still cannot see any bounding box after 7000 steps. Should I just keep training or is this the sign there is an issue?

@Dref360
Copy link

Dref360 commented Mar 15, 2017

Have you looked at postprocess in net/yolo/test? There is a _tresh dict that may disrupt your output. I had to remove it to make it work

@thtrieu
Copy link
Owner

thtrieu commented Mar 16, 2017

@Dref360 that dict is removed in newer versions, please update your code

@AndreaPisoni Please give the steps to reproduce your error.

@hemavakade
Copy link

hemavakade commented Mar 22, 2017

Hi I am trying to train YoloV2 on my different dataset. I have created an annotation file as per PASCAL VOC format. I am trying to identify shoes and bags in the images. As suggested by users ( @ryansun1900 , @y22ma, @thtrieu ) on this repo I used 3-5 images and annotations to train.

I used tiny-yolo-voc.weights and tiny-yolo-voc.cfg. I changed tiny-yolo-voc.cfg for the number of classes and the filters in the last convo layer, as 2 and 35 respectively.

I used a learning rate of 1e-3.

This is the command I used to train,

./flow --train --trainer momentum --model cfg/tiny-yolo-voc-2c.cfg --load bin/tiny-yolo-voc.weights --annotation <path/to/annotation> --dataset <path/to/sampledata> --gpu 0.4

After I ran 200 epochs I got NAN in loss and moving ave loss. I printed out the output matrices while training using

fetches = [self.train_op, loss_op, self.top.out, self.top.inp.out, self.top.inp.inp.out, self.top.inp.inp.inp.out]

I looked for matrices which had values in them and found some values around step 176, so I loaded that model and reran the training with a smaller learning rate= 1e-6. I finally managed to reduce the loss 4.600135803222656 - moving ave loss 4.5986261185381885. I tried to test using the ckpt with the following command,

./flow --test <path/to/test/> --model cfg/tiny-yolo-voc-2c.cfg --load 890

But the images do not have bounding boxes.

Can you please guide me . I am not sure if I have missed any step in between.

@thtrieu
Copy link
Owner

thtrieu commented Mar 22, 2017

I think you are doing fine. Just that the model has not converged. A trained voc model with 20 classes has loss around 4.5; so two classes should be significantly smaller than that.

And you are doing it with only 3-5 images, so I would say overfitting should be the case, i.e. loss << 1.0.

@hemavakade
Copy link

hemavakade commented Mar 22, 2017

@thtrieu, what do you suggest in that case.

I have also disabled noise augmentation during the over-fitting.

## Update: I could bring down the loss to almost 0.01. Had to use a a different optimizer; RMSPROP works better. But when I test, there are still no bounding boxes. This is the command I am using.

./flow --test <path/to/test/> --model cfg/tiny-yolo-voc-2c.cfg --load -1 --gpu 0.4

I checked the output of the box probabilities and they are very low, in the order of < 1e-3.
For the purpose of testing if I am doing everything right, I used the same images I trained on as my test data and it did put the bounding boxes and the values of probabilities are also high around 0.9. Do you suggest training on a larger dataset using the overfit model?

@eugtanchik
Copy link

eugtanchik commented Mar 23, 2017

I have the same problem training on my own toy dataset with 2 classes model. Training process converges according to loss function decreasing, but draws nothing during testing. What I am doing wrong?

@hemavakade
Copy link

Update: I got it working! I have bounding boxes. I used yolo.weights and yolo.cfg. I think this is trained on COCO dataset which is much better for the dataset and classes I am using.

@eugtanchik
Copy link

eugtanchik commented Mar 23, 2017

@hemavakade,
Obviously, I have boxes with yolo.weights and yolo.cfg too. But I want it to work with my own dataset under darkflow to be able to make fine-tuning of the model further.

@hemavakade
Copy link

@eugtanchik I am not sure I understood you. I loaded the yolo.weights but used it to overfit my dataset. Do you mean to say yolo.cfg and yolo.weights are not in yolo - v2?

@eugtanchik
Copy link

eugtanchik commented Mar 23, 2017

@hemavakade,
I mean that yolo.weights are trained on darknet framework, or am I wrong? Sure, this is YOLOv2, but what about number of classes in your case? It is not clear for me what to do, if my classes are not included in COCO dataset. As I know, in darkflow there is not any way to get yolo.weights, only tensorflow model format or protobuf.

@hemavakade
Copy link

@eugtanchik well I have more classes. I was trying to get it work with a small number of classes.

To train further for other classes, I will try the following options.

  • I will use this model checkpoint and train it again with other classes.
  • If the above is not working then there is a section on the YOLO website about using the pre trained imagenet weights. I will try working with that.

@eugtanchik
Copy link

@hemavakade,
Maybe this is a good idea. But it must be the way to train any model from scratch without pre-training. It seems for me that there is some bug in the code. I have not found it yet.

@eugtanchik
Copy link

My problem was fixed by just more number of steps were finished, and I saw some detections. It works fine!

@dkarmon
Copy link

dkarmon commented Apr 8, 2017

solutions suggested here didn't solve my problem.
I used pre-trained weights to train my model on a different dataset with fewer classes.
During the training process, the loss decreased and converged at some point. Afterward, I tried testing to output model on both test and train dataset and in both cases, there are no bounding boxes.

Please advise!

@nattari
Copy link

nattari commented Apr 27, 2017

I am facing similar issue. I trained on own dataset with 3 classes using pre-trained imagenet model i.e. darknet19_448.23 for yolov2..
I do not see any bounding boxes. I am using default setting but is there any role of anchor box parameter that need to be updated depending on your data.
Any help in this context would be very useful!

@denisli
Copy link

denisli commented Apr 30, 2017

Same issue.

Here are the steps I took:
I copied tiny-yolo-voc.cfg file to yolo-new.cfg file. Although I am really looking for 6 classes, I am training for 20 since I could not figure out how to change the number of classes without causing tensors to be inappropriately sized. I was training from scratch and reached a loss of 0.6 or 0.7.

When testing with both the training set and testing set, there were no bounding boxes.

If someone could advise how to change from 20 classes to 6 classes, that would be appreciated as well.

@nattari
Copy link

nattari commented Apr 30, 2017

It worked for me. It is relatively easy in Yolov2 to change the config file to incorporate your data (no additional changes). You need to train for more iterations. Initially, I wasn't detecting any bounding but after training for 40k iterations, I finally could see detection though the result was poor (you need to tune anchors).
I used pre-trained imagenet weights.

@youyuge34
Copy link

@sharoseali
It feels like that i confused u. Just add the label.txt and leave the source code unmodified.

@sharoseali
Copy link

where i can add labels.txt??

@youyuge34
Copy link

@sharoseali
Just follow the main site of Darkflow, at the root dir.

@sharoseali
Copy link

youyuge34............ i have checked the weights file and its corresponding cfg file .. they are giving the same error.. even yolov2-tiny-voc are also not working with their cfg.....
Joseph redmon must be informed about these issues.......... Any how i am going to start training again on tiny- voc which i previously trained . lets see how it behaves this time.....

youyuge34 can u h play with darknet on Linux and coco data-set?? .. if yes what was your experience.

i have 2000 xml files in voc format .. .. I am thinking to convert them in coco format.. but i dont know how to train the data using coco in windows....

@mohamedabdallah1996
Copy link

I face the same problem But I reduced the threshold to 0.0001 and I see many bounding boxes.
so try to reduce the threshold and see your confidence

@mohamedabdallah1996
Copy link

@thtrieu I reached to loss ~1.6 with training on 32 classes But the confidence for all the objects is still 0.0
that mean that the model didn't learn anything. How can I reduce the loss much more in order to get more confidence. I changed the batch size and learning rate but the loss still in the same range!

I need your help please!
thanks in advance

@Dhagash4
Copy link

Dhagash4 commented Jun 4, 2018

How can I change the number of iterations I am doing it with 1500 images divided into six classes are there anyways to change number of iteration?

@Dhagash4
Copy link

Dhagash4 commented Jun 4, 2018

@denisli Can you show me the method to increase iteration at step 554 only I got a loss of 5.34 and I am training 1500 images for 6 classes is that enough or should I increase my dataset.

@sharoseali
Copy link

sharoseali commented Jun 4, 2018 via email

@Dhagash4
Copy link

Dhagash4 commented Jun 5, 2018

@sharoseali Now I will be trying class by class I have 1000 images for that class lets see if I can get the bounding box with epoch 1000. Thank you for guiding me. I will let you know the result.

@sharoseali
Copy link

sharoseali commented Jun 5, 2018 via email

@Dhagash4
Copy link

Dhagash4 commented Jun 5, 2018

I am training two classes with 945 image for one class and 405 for another I am using tiny-yolo-voc weights currently should I change the weights?

@sharoseali
Copy link

sharoseali commented Jun 5, 2018 via email

@Dhagash4
Copy link

Dhagash4 commented Jun 7, 2018

@sharoseali I got the bounding boxes after 5000 steps but the problem is when I downloaded a image from google and tested it was not detecting it. How can I solve that problem is it overfitting problem. Also it was not labelling it like stop sign its just getting bounding boxes nothing written on it which is it and all. Also not detecting anything in the video what to do anybody..... I am doing the training with LISA extension dataset from VIVA website

@sharoseali
Copy link

sharoseali commented Jun 9, 2018 via email

@fogonthedowns
Copy link

Start Command:

HDF5_DISABLE_VERSION_CHECK=2 nohup ./flow --model cfg/tiny-yolo-v2-aviator.cfg --load bin/tiny-yolo-v2.weights --train --annotation /home/ubuntu/model/labels --dataset /home/ubuntu/model/aviators --epoch 10 --batch 8 --savepb True --load 18250 --gpu 0.9 &

Dataset:

~/model/labels$  ls -1 | wc -l
187
~/model/aviators$ ls | wc -l
187

Loss

Finish 996 epoch(es)
step 22909 - loss 0.5680124759674072 - moving ave loss 0.582944897404057
step 22910 - loss 1.782407283782959 - moving ave loss 0.7028911360419472
step 22911 - loss 0.20126786828041077 - moving ave loss 0.6527288092657936
step 22912 - loss 0.4742392301559448 - moving ave loss 0.6348798513548087
step 22913 - loss 0.3661291003227234 - moving ave loss 0.6080047762516002
step 22914 - loss 0.6089756488800049 - moving ave loss 0.6081018635144406
step 22915 - loss 0.4250970184803009 - moving ave loss 0.5898013790110266
step 22916 - loss 0.6636741161346436 - moving ave loss 0.5971886527233883
step 22917 - loss 0.3915417194366455 - moving ave loss 0.576623959394714
step 22918 - loss 0.17965593934059143 - moving ave loss 0.5369271573893017
step 22919 - loss 0.31156492233276367 - moving ave loss 0.514390933883648
step 22920 - loss 0.6093173623085022 - moving ave loss 0.5238835767261334
step 22921 - loss 0.49582234025001526 - moving ave loss 0.5210774530785216
step 22922 - loss 0.6295650601387024 - moving ave loss 0.5319262137845396
step 22923 - loss 0.39114269614219666 - moving ave loss 0.5178478620203054
step 22924 - loss 0.5364546775817871 - moving ave loss 0.5197085435764536
step 22925 - loss 0.46883073449134827 - moving ave loss 0.514620762667943
step 22926 - loss 0.6072037220001221 - moving ave loss 0.5238790586011609
step 22927 - loss 0.3584549129009247 - moving ave loss 0.5073366440311373
step 22928 - loss 0.7908065319061279 - moving ave loss 0.5356836328186364
step 22929 - loss 0.48035216331481934 - moving ave loss 0.5301504858682546
step 22930 - loss 0.3851150870323181 - moving ave loss 0.515646945984661
step 22931 - loss 1.296918511390686 - moving ave loss 0.5937741025252635
Finish 997 epoch(es)

Config:

(more above this line)
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6

What loss is typical of "convergence"? I ran 1000 epochs (22k + steps!) which resulted in very very low loss ~0.1% However my bounding boxes were only drawn around images the model had previously seen (IE they are part of the training set) - I suspect either my training set of data isn't large enough or that the model is WAY overfit and it will only match images it has already seen.

  1. What is the difference between an Epoch and a Step? I notice many people reference steps (and their relationship to checkpoint see @denisli above.)
  2. Whats an acceptable amount of training images to use? I believe I have 200
  3. At what loss does "convergence" typically take place? Are you talking Epochs or "Steps"?
  4. Does this library divide my training images into Test, Train and Verify directories? How can I fight against overfitting?
  5. How are you determining overfitting? It is simply a loss < 0.1?
  6. How long should training take? This trains for a day on AWS with GPUs and its getting expensive!

@aaronhan92
Copy link

It worked for me. It is relatively easy in Yolov2 to change the config file to incorporate your data (no additional changes). You need to train for more iterations. Initially, I wasn't detecting any bounding but after training for 40k iterations, I finally could see detection though the result was poor (you need to tune anchors).
I used pre-trained imagenet weights.

Can you share your code?

@AzadeAlizade
Copy link

AzadeAlizade commented Nov 4, 2018

hey everyone
I have the same issue
after running through every steps of this page and training data in pascal voc; no object is detected.
I changed threshold to 0 and some object has been detected but they are not really useful.
what should I do??

@ManasaNadimpalli
Copy link

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg.
So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

  • epochs = 100
  • batch = 16
  • learning rate = 1e-5

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

Hi sir, Iam training darknet using yolov3. I have trained 200 images and I can see the label but no bounding boxes around them.Can I know what is the reason?

@RamShankarKumar
Copy link

I am testing an image using the methos "
Using darkflow from another python application" in spyder IDE. my program run well at last I get empty array with no prediction. what to do now?
gitpic

@ridhimagarg
Copy link

Hi,

I am also facing the same issue. My model is not able to detect the bounding box. When I set the threshold to 0.00001, it is showing up too many boxes.

@ManasaNadimpalli
Are you able to find out any solutions?

Please give some suggestions. I modified the .cfg file according to my class(# classes =1)

@Alex0795
Copy link

@KamelBouyacoub como haces para disminuir el umbral y que te muestre muchos cuadros? Ayudame con eso porfavor

@aseembh2001
Copy link

aseembh2001 commented Jul 29, 2019

I has the same problem with not getting the bounding boxes.
I trained on 87 images for one class.
I decreased the learning rate to 1e-5 and I was able to get the correct bounding boxes, although not very high confidence(~20%)
Hope this helps !!

@absognety
Copy link

I see you are doing YOLOv2. How much is the loss? I suspect yours has not converged.

I am also facing the same issue as @KamelBouyacoub
My loss after 1000 epochs is at 61.5860000

Finish 986 epoch(es)
step 1973 - loss 62.078346252441406 - moving ave loss 62.335486664291444
step 1974 - loss 62.121891021728516 - moving ave loss 62.31412710003515
Finish 987 epoch(es)
step 1975 - loss 62.219764709472656 - moving ave loss 62.30469086097891
step 1976 - loss 61.881935119628906 - moving ave loss 62.26241528684391
Finish 988 epoch(es)
step 1977 - loss 62.222434997558594 - moving ave loss 62.25841725791538
step 1978 - loss 61.85980224609375 - moving ave loss 62.21855575673322
Finish 989 epoch(es)
step 1979 - loss 62.035133361816406 - moving ave loss 62.20021351724154
step 1980 - loss 61.879722595214844 - moving ave loss 62.168164425038874
Finish 990 epoch(es)
step 1981 - loss 61.71182632446289 - moving ave loss 62.12253061498128
step 1982 - loss 61.67131042480469 - moving ave loss 62.077408595963625
Finish 991 epoch(es)
step 1983 - loss 61.771820068359375 - moving ave loss 62.0468497432032
step 1984 - loss 61.894561767578125 - moving ave loss 62.0316209456407
Finish 992 epoch(es)
step 1985 - loss 61.739654541015625 - moving ave loss 62.00242430517819
step 1986 - loss 61.7847900390625 - moving ave loss 61.980660878566624
Finish 993 epoch(es)
step 1987 - loss 61.47736740112305 - moving ave loss 61.93033153082227
step 1988 - loss 61.691654205322266 - moving ave loss 61.90646379827227
Finish 994 epoch(es)
step 1989 - loss 61.599735260009766 - moving ave loss 61.87579094444602
step 1990 - loss 61.71918487548828 - moving ave loss 61.860130337550245
Finish 995 epoch(es)
step 1991 - loss 61.71525573730469 - moving ave loss 61.84564287752569
step 1992 - loss 61.526390075683594 - moving ave loss 61.81371759734149
Finish 996 epoch(es)
step 1993 - loss 61.45462417602539 - moving ave loss 61.77780825520988
step 1994 - loss 61.457122802734375 - moving ave loss 61.74573970996233
Finish 997 epoch(es)
step 1995 - loss 61.439453125 - moving ave loss 61.715111051466096
step 1996 - loss 61.43961715698242 - moving ave loss 61.68756166201773
Finish 998 epoch(es)
step 1997 - loss 61.436065673828125 - moving ave loss 61.662412063198765
step 1998 - loss 61.47761535644531 - moving ave loss 61.643932392523425
Finish 999 epoch(es)
step 1999 - loss 61.33710479736328 - moving ave loss 61.61324963300741
step 2000 - loss 61.340763092041016 - moving ave loss 61.586000978910775
Checkpoint at step 2000
Finish 1000 epoch(es)
Training finished, exit.

Does this mean it is not converging?

@slntopp
Copy link

slntopp commented Feb 27, 2020

Have the same issue:
loss < 0.45 after 15k steps, 1000+ images for each class. Tried overfitting with 20 images - it was fine.
Using tiny-yolo-voc cfg and weights. Is there any solutions?(

@xinyee1997
Copy link

I faced the problem too. No bounding box at all. Any solution?

@ozanpkr
Copy link

ozanpkr commented Apr 1, 2020

@thtrieu
When I tried training Yolov2 with only PascalVoc2012 Car labeled data, I get 0.000007 loss.Although,I cannot see any bounding boxes when ı tested on image.Is that means over fitting?How can I solve that???

@ludwikbukowski
Copy link

the same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests