Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No detection is found after training #294

Open
WinfredHuang opened this issue Nov 9, 2017 · 49 comments
Open

No detection is found after training #294

WinfredHuang opened this issue Nov 9, 2017 · 49 comments

Comments

@WinfredHuang
Copy link

My computer: i5 4210M, GTX850M, Windows 10, CUDA 8, Visual Studio 2017 (with 2015 toolset installed)
Training with a dataset called Chars74K, selecting a subset of number 0-9 and letter E, totally 11176 pictures. Divided into two roughly equal parts, for training and test respectively.
Since training is too slow, I'd like to perform an intermediate check. After 500 epochs, I ran the following command: (Note: I train with GPU, but detect with CPU)
.\darknet_no_gpu detector test cfg\chars74k.data tiny-yolo-chars74k-test.cfg backup\tiny-yolo-chars74k_500.weights -thresh 0.1 img001-00002.png
But it returns no bounding boxes.
I'm sure that chars74k.data is correct, batch count and subdivision count are set to 1 in tiny-yolo-chars74k-test.cfg file (but for training, I'm using slightly modified cfg file, and they're 48 and 8 respectively). There is a similar issue #257 , but not a solution for my case.
Is it true that even for character detection (a much simpler problem compared to VOC or COCO), it's imperative that you run 10,000 epochs before you can see the result(even if the result is not so correct)? Or, are there any mistakes when I'm training or detecting?
P.S. Chars74K can be found here:
http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
I'll post more details if any of you ask.

@TheMikeyR
Copy link

Did you try -thresh 0 to see if you get any detections, a threshold on 10% might not be low enough after only 500 iterations (I assume you mean iterations and not epochs, if you have e.g. 1000 images and batch size of 500 it will take 2 iterations to train one epoch).

You can check the output of the console while training. to see how well it trains on your current data e.g. look at avg loss, or avg recall which tells you how many objects yolo detected out of positive outputs in this iteration. It might get 8 objects as input but only detects 1 or maybe even 0? Which means it needs more training.

To learn more about darknet output you can check this https://timebutt.github.io/static/understanding-yolov2-training-output/

This is a nice guide which helped me to train yolo on custom dataset: https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/

@WinfredHuang
Copy link
Author

Thank you for your response. Yes, I mean iterations.
However, the recall indicates that most of the objects are detected, often 7 out of 8, or 5 out of 6. The average loss is about 1.0-1.2. I think the network is well converging, even it's only 500 iterations.
The test image is one of the training samples, so I'm expecting a certain detection. However, even I set threshold to zero, it still shows zero detection.
Nevertheless, I'll check the links you provided to me. I'll appreciate it if you give me further help.

@TheMikeyR
Copy link

Seems odd you don't get any detections at all, have you tried to run an example to see if everything works as it should ?

@WinfredHuang
Copy link
Author

Yes, if I use the pretrained weights to detect, it detects correctly. (I use VOC datasets. )

@TheMikeyR
Copy link

TheMikeyR commented Nov 9, 2017

Okay I've just looked at the dataset and it seems like the images are really small compared to what yolo expects. Not sure what input resolution you have in your cfg but you should have images with resolution greater than network resolution (i.e. greater than width=416 height=416), but what I can see it that the dataset have images with resolution 156x195 (didn't look all through just took one sample).

YOLO might not be the best framework for detecting numbers and symbols, you might have better luck using something like tesseract ocr but if you want to continue with YOLO you should consider another dataset.

@WinfredHuang
Copy link
Author

Thank you for your response. Actually our goal is to recognise printed numbers shot by a fixed camera. I don't think YOLO can't handle such simple mission, as I actually observed the average loss dropping from 28 to 0.4, and average recall rising from 0.5 to nearly 1.

@TheMikeyR
Copy link

If you want to that, you should train YOLO on data like that, you could e.g. generate your own small dataset and annotate it yourself and do image augmentations to increase the size of the dataset. A good rule is to have at least 2000 images per class to train with, but the more the better since it will be harder to overfit the data.

@WinfredHuang
Copy link
Author

WinfredHuang commented Nov 10, 2017

Well...perhaps I'm too optimistic about training. On 500 batches, the average loss is still on 1.0-1.2. The ideal loss is slightly above 0.06. I don't know but I think I need much more training.
Have you tested whether the weights will work if the loss doesn't reach 0.06?
Also, is it possible to adapt YOLO to a smaller (in image size) dataset? If so, which part should I modify?

@WinfredHuang
Copy link
Author

I tried Tesseract OCR, but it doesn't fit our needs for now. I'll keep it as an alternative solution anyway.

@TheMikeyR
Copy link

TheMikeyR commented Nov 10, 2017

On my dataset after 50k iterations I had an avg loss ~1.0 and my test works really well, getting pretty good detections, but sometimes I get false negatives though, but I believe it is my training set that lacks that specific case.

I think you should stay at 288x288 but you can try to go lower and see if it works out, the problem is the grid size of YOLO, and it will be hard to detect more than one image. YOLO is looking at the entire image in context and not image batches such as Faster-RCNN.

You should modify in the config.cfg you are loading at the very top there is width and height definition, change those to e.g. 288x288. At the very bottom of the config there is a random=1 parameter, this defines if the yolo should resize the input image before training, if you want to lock it to 288x288 then put it to 0.

An idea is to take the letters you want to train on and make a script that but them on a random background image (pulled from google etc. or database) and then saving the location of the letter. Use random crops, rotating, skewing to the letter before inserting them on a random background. That way you can get bigger dataset and also make images in the size of what a digital photo will be. Remember the smaller an object you want to detect the bigger of a network you want. 288x288 sized network is much worse at detecting small objects compared to 608x608.

I can recommend that you read the papers if you didn't already, it explains a lot on how YOLO works and it's pros and cons. YOLO and YOLOv2

@WinfredHuang
Copy link
Author

First, thank you for your kind response.
You give me a good hint about network definition. Actually, that's what I'm thinking about, but I'm not sure how can I modify. I'm now clear about that. Your hint about fixing the input size is helpful too.
By the way, I have found systematic tagging errors in my dataset. I also realised that I need to dig the papers a bit deeper to understand what every parameter in cfg file means. I have also created test images by my own, which have several characters in a line, but still no detection.
I'll update my progress if there are any.

@WinfredHuang
Copy link
Author

WinfredHuang commented Nov 10, 2017

The problems I found in my previous experiments are:

  1. Labelling problems. Category index started from 1, and there are invalid category indexes.
  2. Output not modified following classes. My dataset has 11 classes, so the last convolutional layer should have 9 * 9 * 80 outputs(with 288 * 288 input), not 9 * 9 * 125 outputs.
  3. (possibly)Anchor values should be recalculated, not using the values for VOC or COCO.
  4. The size of input is now fixed instead of randomly resized.
    I'll post more progress if there are any.

@TheMikeyR
Copy link

TheMikeyR commented Nov 10, 2017

@AurusHuang For anchor values, you don't have to recalculate them, from my understanding YOLO is automatically deciding on new anchors based on the training data (I've myself experimenting with calculating my own anchors, but it resulted in worse detections than yolo's own). I might be wrong though, since this is from my own experience and how I interpret the paper.

For last filter (output based on classes) You are definitely right, the last filter size should be 80, in the default tiny-yolo.cfg it is line 119 that needs to be changed to 80.

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=425 # Change to 80 for your data
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=80 # Change to 11 for your data
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

You classes.names file I would expect to look like this, not sure how you did it and the order doesn't matter, but it is important that your detections number matches the line number of the class in the classes.names file to get correct labeling:

0
1
2
3
4
5
6
7
8
9
E

Your annotation file should be formatted as:

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y]

This is an example of one frame in my dataset frame 2321.txt, in this frame there are two different classes, class 0 which is person and class 4 which is aeroplane class. The rest is as listed above the information on where the objects are located in frame 2321.jpg

0 0.7653645833333333 0.5199074074074074 0.2817708333333333 0.48055555555555557
4 0.78515625 0.21435185185185185 0.3296875 0.4287037037037037

Hope this provide some more insight and if you need more help feel free to ask.

@WinfredHuang
Copy link
Author

I think I'm doing right about the above things. I'm now trying to run a much simpler demo to find out what is wrong.

@WinfredHuang
Copy link
Author

Well, I must admit that I'm too naive...@TheMikeyR
I followed the tutorial you give me and tested in NFPA 704 dataset with only one class. I performed tests when average loss reached 0,9(about 1 hours later), 0.5(about 2.5 hours later) and 0.17(about 4 hours later).
It shows no detection at 0.9 and 0.5, but shows several detections ranging 24% to 60% at 0.17.
Maybe for larger and more varied datasets, it should take much longer time to converge.

@TheMikeyR
Copy link

I'm happy that you managed to get some results now! Yeah, the training time can be long, I'm using a K40 on AWS and I usually let it train for 2-3 days before I get satisfying results to compare it with my older trainings.

@WinfredHuang
Copy link
Author

I'm now fighting with possible overfitting problem...
The loss is now ceased to decline. It stays at the current level (although each time it may differ more) for over 2000 iterations. Changing the policy, network size and batch size seems not working.
Maybe 2000 iterations is still not enough?

@TheMikeyR
Copy link

It depends on your dataset, you can try to let it run and take out the .weights file for each backup step and test it.
At some point it can't get better with the data you are using, and you might need to do img augmentations to get better results.

@WinfredHuang
Copy link
Author

What do you mean by "image augmentation"?
Now 5000 iterations, about 60 epochs. Still no detection, since loss stays at 1.00...

@TheMikeyR
Copy link

@AurusHuang Random croppings, rotations, skewing to your dataset to make neural network generalise better, there have been many articles on gaining better performance by doing so. Some augmentations helps, other don't. I've had quite success using a small dataset and then rotating every image 90, 180 and 270 degrees.
Here is a library which can be used but also gives great examples https://github.com/aleju/imgaug, remember the location of the object need to follow the augmentation, so if you rotate you should also rotate the detection coordinates.

@WinfredHuang
Copy link
Author

Speaking of character recognition, what if I
(1) use a smaller dataset, like 100 images instead of 1016 images per class?
(2) use a more "focused" dataset, i.e. the characters are of similar typefaces?
(3) generate samples of bigger size with smaller pictures, like placing the 128 * 128 sample randomly in a 512 * 512 canvas?
(4) use a smaller learning rate?
I'm sure that in our project, we just need to detect only a few similar typefaces (say, Verdana or Helvetica). The images are shot by a camera, with possible linear or affine distortion.

@TheMikeyR
Copy link

The bigger dataset the better is a general thumb of rule. Can't really help you that much with what thing is better, is all about trial and error process, try it out and see what you get. You can read papers within the subject to get an idea on what other people did and had success with.
Feel free to keep me updated with your experience!

@WinfredHuang
Copy link
Author

Well, I think adjusting learning strategy is really important!
Since my training converges (reaching a stable loss hight above 0.06) too quickly, I lowered the learning rate by 3 and chose a monotonic descending step policy. It seems promising for now. However, lowering the learning rate slows down the training process, so I need more time to observe if it's working.

@wakanawakana
Copy link

I recommend plotting the learning logs.
What I succeeded was when over IOU 0.5 and objectness exceeded the detection threshold
example
figure_1

@WinfredHuang
Copy link
Author

It's indeed a good idea.
Can you tell me which plotting tool are you using above?

@wakanawakana
Copy link

Try PlotYoloLog.py
this is my project
https://github.com/wakanawakana/python/

@WinfredHuang
Copy link
Author

It seems that I must generate a more "arbitrary" dataset if I want to detect text on arbitrary positions.
I have randomly resized the original 128 * 128 dataset (from half size to double size) and paste each image to a new 416 * 416 canvas. The training progress is now completely different.
Also, the characters are always on the centre in original dataset. In the new dataset, the characters are scattered on the canvas.
This has taught me that don't use a highly regular dataset in a training if the test object is random. If the dataset IS regular, generate some irregularity by self.
I don't know if it's appropriate, but I'll post my progress if there are any.
Thanks @wakanawakana , I'll keep this tool and use it if necessary.

@wakanawakana
Copy link

If the original is 128 * 128, is it fast to fix the network to 128 * 128?
Because yolo makes 32pix 1 grid, it becomes 4 grid

probabry

[net]
width=128
height=128
[convolutional]
filters=85
[region]
coords=5
classes=11
num=5

@WinfredHuang
Copy link
Author

WinfredHuang commented Nov 17, 2017

Maybe, but I'd better modify the dataset in order to provide a context similar to actual detection.
I'm now trying to determine a proper learning rate. Currently, no matter what learning rate I set, the loss and the recall drops simultaneously.
Now, it takes 254 iterations to complete an epoch. Maybe I should observe a bit longer to determine if it's working.
Sometimes slow progress is due to a frugal boss...he never permits us to buy a decent GPU...With a GTX1080Ti, it will only take 42 iterations to complete an epoch. But I can only experiment with my poor GTX850M now...
Also, @wakanawakana I'm unable to log the running results. I don't know how. Simply using darknet <blabla> >log,txt will get darknet stuck.

@WinfredHuang
Copy link
Author

WinfredHuang commented Nov 17, 2017

I think @wakanawakana has inspired me about Obj and No Obj.
Only if Obj reaches a higher level (0.5 or higher) and No Obj drops to nearly zero and loss drops to a low level should we stop training.
I guess that Obj is the confidence of having target detected. @TheMikeyR

@wakanawakana
Copy link

I think
YOLOv2 is generating predictions from grid.
Attaching the image of the target to the space of 416 * 416 does not achieve learning.

Logs under training are taken by redirect (>) etc.
I am using "TrainYolo.py" because I want to look at the logs.

@TheMikeyR
Copy link

@AurusHuang So Obj is YOLO's confidence of the detected object being a target, No Obj is YOLOs confidence of object not being a target? And the number is representing the subdivision e.g. 8 images? Hmm, I still think it is easier to look at count and avg. recall, but the confidence might tell more about if the training is overfitting.

@WinfredHuang
Copy link
Author

Sorry for closing it accidentally.
I'll post a sample picture and the cfg file (renamed to txt) I use. I want to be sure that no errors exist in my dataset and config file.
Also, if you have any idea about how to set a suitable learning policy, feel free to share with me.
img006-00490
img006-00490.txt
tiny-yolo-chars74kb.cfg.txt

@wakanawakana
Copy link

wakanawakana commented Nov 17, 2017

Do you learn all the space of 416 x 416 at the center of the number (the position where YOLO generates predictions)?
Using an augmented image of 173056 per number as one trial calculation easily.
For 13x13 grid
I think that it is okay to thin out more.
The central prediction of yolo is Grid pos + Δx

@wakanawakana
Copy link

Implementation you should refer to
https://github.com/leetenki/YOLOv2/blob/master/YOLOv2_animal_train.md

@WinfredHuang
Copy link
Author

WinfredHuang commented Nov 17, 2017

Well...a Japanese document...
Although I'm a Chinese, I can't really read Japanese.
I'm not using this repository since I'm working under Windows, and I'm not allowed to use Python when training and testing regarding what I should achieve.
Also, I have difficulties reading your comment... @wakanawakana

@wakanawakana
Copy link

In order to generate prediction with random coordinates of 416 x 416, automatic generation of images and coordinate generation for learning are controlled by a program.
Let 128 x 128 icons be learned in all 416 x 416 coordinate spaces.

@WinfredHuang
Copy link
Author

Well...the above picture of a number 5 is only one of the thousands of samples. The position and size of numbers are randomised by me using a Python script. The size of numbers vary from 64 * 64 to 256 * 256, but the canvas size is a fixed 416 * 416.
I did this because I directly used the 128 * 128 source images for training a 416 * 416 network. However, it doesn't seem to be a good idea, as the position and size of various numbers were fixed. I did this to create a dataset similar to the ones you use for other purposes, like COCO and VOC.

@wakanawakana
Copy link

Unfortunately, I can not understand what you want to do.

@groot-1313
Copy link

groot-1313 commented Jan 4, 2018

Does YOLO calculate the new coordinates based on our own annotated training data? Aren't the pre calculated anchors supposed to be manually entered in the cfg file for the network to read while training?

@ManivannanMurugavel
Copy link

ManivannanMurugavel commented Feb 1, 2018

It working for me when i set thresh 0.1 and i have error loss is 0.28 when training

@bicepjai
Copy link

bicepjai commented Feb 5, 2018

basic question regarding anchor boxes. what are these values relative to, i mean its units

  1. pixels interms of original image in range [0,416]
  2. [0,1] relative to original image height and width (416)
  3. pixels relative to grid cell size in range [0,32]
  4. [0,13] in terms of pixel
  5. [0,1] relative to grid cell height and width (32 if number of grids is 13)

@chexov
Copy link

chexov commented Feb 22, 2018

@bicepjai the units of "anchors" are width and height for every anchor box in percentage:
anchors = [1,1, 0.5, 0.5]
This means the first anchor has 100% x 100% size and second 50% x 50% size from the picture size

@hahakid
Copy link

hahakid commented Mar 14, 2018

Try to use batch=1 for test/valid?

@ManivannanMurugavel
Copy link

Please use this link for single object training in YOLO
https://medium.com/@manivannan_data/how-to-train-yolov2-to-detect-custom-objects-9010df784f36

@TTTWang
Copy link

TTTWang commented Mar 29, 2019

Well, I have the same problem when trying to recognize digits. I generate my train dataset by randomly creating some numbers with different background.
Like this (with labeled text file):
2
2.txt
40
40.txt
The image is size is exact the same as the settings in the cfg file (416x416). After 100 iterations, the output is:
Region 82 Avg IOU: 0.937253, Class: 0.884135, Obj: 0.696360, No Obj: 0.004303, .5R: 1.000000, .75R: 1.000000, count: 8
Region 94 Avg IOU: 0.960411, Class: 0.990598, Obj: 0.895991, No Obj: 0.002897, .5R: 1.000000, .75R: 1.000000, count: 8
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.937253, Class: 0.884135, Obj: 0.696360, No Obj: 0.004303, .5R: 1.000000, .75R: 1.000000, count: 8
Region 94 Avg IOU: 0.960411, Class: 0.990598, Obj: 0.895991, No Obj: 0.002897, .5R: 1.000000, .75R: 1.000000, count: 8
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.937253, Class: 0.884135, Obj: 0.696360, No Obj: 0.004303, .5R: 1.000000, .75R: 1.000000, count: 8
Region 94 Avg IOU: 0.960411, Class: 0.990598, Obj: 0.895991, No Obj: 0.002897, .5R: 1.000000, .75R: 1.000000, count: 8
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.937253, Class: 0.884135, Obj: 0.696360, No Obj: 0.004303, .5R: 1.000000, .75R: 1.000000, count: 8
Region 94 Avg IOU: 0.960411, Class: 0.990598, Obj: 0.895991, No Obj: 0.002897, .5R: 1.000000, .75R: 1.000000, count: 8
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
109: 0.203249, 0.320155 avg, 0.001000 rate, 511.135857 seconds, 1744 images
However, when I use the generated 100 iters weight file, it detects nothing even I set the thresh to 0.1. Maybe I need to be more patient to test with 1000 iters weight file.

@Nikhilbharadwaj08
Copy link

@AurusHuang For anchor values, you don't have to recalculate them, from my understanding YOLO is automatically deciding on new anchors based on the training data (I've myself experimenting with calculating my own anchors, but it resulted in worse detections than yolo's own). I might be wrong though, since this is from my own experience and how I interpret the paper.

For last filter (output based on classes) You are definitely right, the last filter size should be 80, in the default tiny-yolo.cfg it is line 119 that needs to be changed to 80.

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=425 # Change to 80 for your data
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=80 # Change to 11 for your data
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

You classes.names file I would expect to look like this, not sure how you did it and the order doesn't matter, but it is important that your detections number matches the line number of the class in the classes.names file to get correct labeling:

0
1
2
3
4
5
6
7
8
9
E

Your annotation file should be formatted as:

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y]

This is an example of one frame in my dataset frame 2321.txt, in this frame there are two different classes, class 0 which is person and class 4 which is aeroplane class. The rest is as listed above the information on where the objects are located in frame 2321.jpg

0 0.7653645833333333 0.5199074074074074 0.2817708333333333 0.48055555555555557
4 0.78515625 0.21435185185185185 0.3296875 0.4287037037037037

Hope this provide some more insight and if you need more help feel free to ask.

hi
What if yolo fails to find any object in a image..!!
so what is the statement(syntax) to be given to return object not found.!?

@atharvaagate
Copy link

My computer: i5 4210M, GTX850M, Windows 10, CUDA 8, Visual Studio 2017 (with 2015 toolset installed) Training with a dataset called Chars74K, selecting a subset of number 0-9 and letter E, totally 11176 pictures. Divided into two roughly equal parts, for training and test respectively. Since training is too slow, I'd like to perform an intermediate check. After 500 epochs, I ran the following command: (Note: I train with GPU, but detect with CPU) .\darknet_no_gpu detector test cfg\chars74k.data tiny-yolo-chars74k-test.cfg backup\tiny-yolo-chars74k_500.weights -thresh 0.1 img001-00002.png But it returns no bounding boxes. I'm sure that chars74k.data is correct, batch count and subdivision count are set to 1 in tiny-yolo-chars74k-test.cfg file (but for training, I'm using slightly modified cfg file, and they're 48 and 8 respectively). There is a similar issue #257 , but not a solution for my case. Is it true that even for character detection (a much simpler problem compared to VOC or COCO), it's imperative that you run 10,000 epochs before you can see the result(even if the result is not so correct)? Or, are there any mistakes when I'm training or detecting? P.S. Chars74K can be found here: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ I'll post more details if any of you ask.

Hello!I trained my model on custom datasets .But my trained model does not predict anything whatsoever.What is it that might have gone wrong?

@ManivannanMurugavel
Copy link

ManivannanMurugavel commented Oct 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests