Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any suggestion for training tiny-yolo from scratch? #696

Closed
Ringhu opened this issue Dec 9, 2019 · 12 comments
Closed

Any suggestion for training tiny-yolo from scratch? #696

Ringhu opened this issue Dec 9, 2019 · 12 comments

Comments

@Ringhu
Copy link

Ringhu commented Dec 9, 2019

Thanks for the contribution first!
I have a question here. I'm doing some research of tiny-yolo so I need to reproduce the result, which is the mAP, of tiny-yolo. In the README.md you mention the [email protected] of size 416 is 33.0, while I only get 30.7 when I trained the tiny-yolo from scratch. My trainning command is:
python3 train.py --cfg=cfg/yolov3-tiny.cfg --batch-size=64 --device=1,2 --weights=
And I do the training with 2 RTX 2080Ti GPU.

The results is attachedresults.txt.

Is there any suggestion for my training to increase the mAP to 33.0?

@Ringhu Ringhu changed the title Any suggestion for training tinyfrom scratch? Any suggestion for training tiny-yolo from scratch? Dec 9, 2019
@glenn-jocher
Copy link
Member

glenn-jocher commented Dec 9, 2019

@Ringhu you need accumulate 1.

python3 train.py --cfg cfg/yolov3-tiny.cfg --batch-size 64 --accumulate 1 --weights ''

@Ringhu
Copy link
Author

Ringhu commented Dec 9, 2019

@glenn-jocher Thanks for reply. I will try it sooner.

@glenn-jocher
Copy link
Member

glenn-jocher commented Dec 9, 2019

@Ringhu BTW I would also git pull a current version of the repo as it changes often, and rather than look at your results.txt there should be a results.png file created after training finishes.

@Ringhu
Copy link
Author

Ringhu commented Dec 10, 2019

@glenn-jocher the .png file is attached.
results
And could you please explain why adjusting the accumulate to 1 helps?

@glenn-jocher
Copy link
Member

@Ringhu this is very strange behavior on the last couple epochs. I've never seen the validation losses drop like that. Something may be wrong with your training. Are you using an un-modified git clone?

About the --accumulate, you always need to use --batch-size --accumulate multiplied together to calculate the total batch size. So your first command was --batch-size 64 --accumulate 4 (4 is argparse default), which leads to a total batch-size of 256, which is much larger than recommended. We recommend total batch size of 64, using --batch 64 --accum 1, or --batch 32 --accum 2 for example.

@glenn-jocher
Copy link
Member

Also use multi scale.

Basically use everything mentioned in #310

@Ringhu
Copy link
Author

Ringhu commented Dec 11, 2019

Hi @glenn-jocher ,Here is my update command:
python3 se_train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --evolve --cache-images --name=baseline --device=4,5,6,7 --adam
I change the 'evolve' part code to test after every epoch. However, there r some problems and I don't know if it's normal.
1.The speed is really slow, it takes about half an hour for training an epoch now;
2.The mAP doesn't increase. By now I trained 20 epoch but the mAP keep around 0.02 and doesn't increase.
The log is right here.
image
Looking forward to your advice.

@glenn-jocher
Copy link
Member

This command reproduces our mAP results when training yolov3-spp.cfg from scratch. See https://github.com/ultralytics/yolov3#reproduce-our-results

$ python3 train.py --weights '' --cfg yolov3-spp.cfg --epochs 273 --batch 16 --accum 4 --multi --pre

results

I suggest you simply clone the default repo without changes and train using the above command, swapping your cfg in of course. --cache is a good idea for smaller datasets as well.

@Ringhu
Copy link
Author

Ringhu commented Dec 13, 2019

Hi @glenn-jocher ,I git pull a current version of the repo yesterday and trained yolov3 24epoch for test.However, the result still seemed not good,
20191213095314.
My command is here:
python3 train.py --batch-size=32 --accumulate=2 --cfg=cfg/yolov3.cfg --multi-scale --device=6,7 --name=baseline --adam --weights= --prebias
I don't know if it's normal or something wrong with my training.

@glenn-jocher
Copy link
Member

@Ringhu don't use --adam

@Ringhu
Copy link
Author

Ringhu commented Dec 17, 2019

Hi @glenn-jocher .It's just a update of my training. Finally I trained the tiny-yolo to the [email protected] 33.1 with this command:
python3 train.py --batch-size=64 --accumulate=1 --cfg=cfg/yolov3-tiny.cfg --multi-scale --prebias
the results are here:
AP
baseline_results
Thank you for all your advice!

@glenn-jocher
Copy link
Member

@Ringhu yeah that all looks correct! Good work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants