Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

args to reproduce tiny-yolo metrics on img_size=320 #1111

Closed
akshaychawla opened this issue Apr 28, 2020 · 13 comments
Closed

args to reproduce tiny-yolo metrics on img_size=320 #1111

akshaychawla opened this issue Apr 28, 2020 · 13 comments
Labels
Stale Stale and schedule for closing soon

Comments

@akshaychawla
Copy link

Hello,

I'm trying to reproduce the following tiny-yolo mAP metric:
Screen Shot 2020-04-28 at 12 56 19 PM

Are the following arguments correct?

python train.py --data data/coco2014.data --weights '' --img-size 320 --epochs 300 --cfg cfg/yolov3-tiny.cfg --batch-size 64 --accumulate 1 

note that the img-size is set to 320, instead of 416, since I will test it only on imgs of size=320x320.

@glenn-jocher
Copy link
Member

@akshaychawla to test pretrained weights:

$ python test.py --cfg yolov3-tiny.cfg --weights yolov3-tiny.pt --img 320

To train from scratch see https://github.com/ultralytics/yolov3#reproduce-our-results

$ python train.py --weights '' --cfg yolov3-tiny.cfg --epochs 300 --batch-size 64 --img 320 640

Reduce --batch size if you get cuda out of memory errors.

@akshaychawla
Copy link
Author

Thanks @glenn-jocher , I'll test out these arguments and post the results here.

@glenn-jocher
Copy link
Member

FYI the training is multi-scale 320-640, which produces better results at 320 (actually at all resolutions) than simply training at 320.

@akshaychawla
Copy link
Author

akshaychawla commented May 1, 2020

NOTE: The results shown in this comment are misleading, please look at the latest comments below which contain correct information.


Update on training:

commit f1d73a29e549654c99674bf07dd8f7a2f5c19d18
python train.py --weights '' --cfg yolov3-tiny.cfg --epochs 300 --batch-size 64 --img-size 320 640 --data 'data/coco2017.data' --device '0,1' --name 'baseline' 

(this will be updated when the training finishes)

Screen Shot 2020-05-01 at 10 06 31 AM

In comparison to #696 , the differences are:

  1. They reach ~0.20 mAP at 100 epochs, this one's hitting ~0.15 mAP / 100 epochs. But i'll wait till the training finishes since there seems to be a large jump coming up at 200 epochs.
  2. I don't set the --multi-scale flag, but it seems to be doing that anyway since the image sizes range from 320 - 640.
  3. The --prebias flag isn't available in the current version of train.py. Is it the same as burn in?

Other things:
Running without mixed precision on 2 Tesla V100.

@glenn-jocher
Copy link
Member

@akshaychawla yes there's been a lot of updates since #696, which train to higher mAPs. Your command looks fine, but your results do not look right. This is the most recent training of the two models we have.
results

@glenn-jocher
Copy link
Member

@akshaychawla if you have two GPUs at your disposal I would simply install apex on your system, and train one model on each. I think you'll find that yolov3-spp works much better and is still extremely fast.

About your discrepancy, there must be some difference in your code. I would cancel your training, git clone a new copy and start from scratch. I've attached the results.txt files, you should be seeing similar results.
results_yolov3-spp-ultralytics131.txt
results_yolov3-tiny_ultralytics132.txt

@akshaychawla
Copy link
Author

akshaychawla commented May 1, 2020

Sure, I'll pull a fresh copy of the repository and train yolov3-tiny from scratch with mixed precision. However, we're a little bit inflexible on the architecture right now because the code is being modified to support distillation from model A (yolov3) to model B (yolov3-tiny). Will post an update as soon as training finishes.

Also, just to confirm, we're training coco2017 right? the one from get_coco2017.sh ?

Thanks!

@glenn-jocher
Copy link
Member

glenn-jocher commented May 1, 2020

@akshaychawla 2014 and 2017 use the same images, just a different breakdown between trian/val. The above plots are for 2014 (to compare to original yolov3 paper results), but you will see identical results when training 2017.

Just make sure later on you test with the same dataset that you trained on. So train with 2017, then if you want to use test.py later on, specify python test.py --data coco2017.data

@akshaychawla
Copy link
Author

Dev box: 1x Tesla V100, Ubuntu 18.04

Training & testing on COCO2017 (from data/get_coco2017.sh)

python train.py --weights '' --cfg yolov3-tiny.cfg --epochs 300 --batch-size 64 --data 'data/coco2017.data' --device '0' --img 320 640

results

Last epoch's [email protected]: 0.339

Link to results and weights: https://drive.google.com/drive/folders/1Q_lUdBiLnh7VNWm8heXf3BJuBFIcAIRM?usp=sharing

@glenn-jocher
Copy link
Member

@akshaychawla ah, great! Yes this all looks correct. When you test this model directly with test.py, pycocotools will give the official coco mAP, which tends to be a bit higher than our locally produced mAP. i.e.:

python3 test.py --data data/coco2017.data --cfg yolov3-tiny.cfg --weights weights/last.pt

@akshaychawla
Copy link
Author

Thanks Glenn! We were hoping to test get the official coco mAP at the end of training but it seems that the version of pycocotools installed in our environment has an issue (cocodataset/cocoapi#356) with the installed version of numpy.

I'll be sure to post the cocotools metrics later with the corrected environment, for now we're moving on to test knowledge distillation.

Again, thank you so much for your work!

@glenn-jocher
Copy link
Member

glenn-jocher commented May 5, 2020

@akshaychawla ah yes. You likely need to enforce numpy == 1.17 I believe in order for pycocotools to function properly. pycocotools mAP is typically about 1% higher than ours (for unknown reasons), so your result is just about in line with the readme. The only other 'catch' is that [email protected] is highest at --conf 0.5, while [email protected]:0.95 is highest at --conf 0.7, so the training results show you mAP at the middle ground --conf 0.6. If you really want to maximize one or the other (as in the readme table), you should set your --conf accordingly (but we are talking about fractions of a percent here, so not a big difference).

@github-actions
Copy link

github-actions bot commented Jun 5, 2020

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

2 participants