Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different mAP from cocoAPI #171

Closed
xiao1228 opened this issue Mar 27, 2019 · 23 comments
Closed

Different mAP from cocoAPI #171

xiao1228 opened this issue Mar 27, 2019 · 23 comments
Labels
bug Something isn't working

Comments

@xiao1228
Copy link

I am training the code from scratch and the results from best.pt are shown below

python3 test.py --weights weights/best.pt --save-json --img-size 416
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.3, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.45, save_json=True, weights='weights/best.pt')

      Image      Total          P          R        mAP
         32       5000      0.562      0.545      0.523      0.744s
         64       5000      0.554      0.549      0.522      0.472s
         96       5000      0.546      0.555      0.529      0.461s
        128       5000      0.547      0.558      0.533      0.435s
        160       5000      0.548      0.559      0.537      0.468s
        192       5000      0.554      0.562       0.54      0.431s
        224       5000       0.55      0.552       0.53       0.45s
        256       5000      0.558      0.555      0.535      0.469s
        288       5000      0.568      0.565      0.545      0.408s
        320       5000      0.561      0.559      0.539      0.461s
        352       5000       0.56      0.556      0.535      0.474s
        384       5000      0.565      0.564      0.543      0.417s
        416       5000      0.564      0.566      0.545      0.514s
        448       5000      0.563      0.562      0.542      0.553s
        480       5000      0.558      0.557      0.536      0.431s
        512       5000      0.553      0.553      0.532      0.505s
        544       5000       0.55      0.551       0.53      0.477s
        576       5000      0.554      0.554      0.533      0.489s
        608       5000      0.557      0.556      0.535      0.553s
        640       5000      0.555      0.553      0.532      0.485s
        672       5000      0.551      0.549      0.528      0.556s
        704       5000      0.553       0.55       0.53      0.395s
        736       5000      0.551      0.547      0.526      0.521s
        768       5000      0.552       0.55      0.529      0.498s
        800       5000      0.553       0.55      0.529      0.478s
        832       5000      0.554      0.551      0.529      0.491s
        864       5000      0.552      0.549      0.527      0.493s
        896       5000       0.55      0.547      0.525       0.58s
        928       5000      0.549      0.546      0.524      0.554s
        960       5000      0.552      0.548      0.526      0.448s
        992       5000       0.55      0.546      0.524      0.554s
       1024       5000      0.551      0.547      0.525      0.496s
       1056       5000      0.552      0.547      0.526      0.427s
       1088       5000      0.551      0.546      0.524      0.473s
       1120       5000       0.55      0.546      0.524      0.538s
       1152       5000      0.549      0.544      0.523       0.49s
       1184       5000      0.547      0.542       0.52      0.484s
       1216       5000      0.545       0.54      0.519      0.508s
       1248       5000      0.545      0.539      0.518       0.51s
       1280       5000      0.544      0.539      0.517      0.535s
       1312       5000      0.548      0.541       0.52      0.494s
       1344       5000      0.548      0.542      0.521      0.608s
       1376       5000      0.549      0.544      0.523      0.532s
       1408       5000      0.551      0.546      0.525      0.465s
       1440       5000       0.55      0.546      0.525       0.46s
       1472       5000      0.549      0.546      0.524       0.57s
       1504       5000       0.55      0.546      0.525      0.522s
       1536       5000       0.55      0.547      0.526      0.495s
       1568       5000       0.55      0.547      0.526      0.453s
       1600       5000      0.551      0.547      0.526      0.513s
       1632       5000      0.551      0.548      0.527      0.444s
       1664       5000      0.552      0.549      0.528      0.443s
       1696       5000      0.554       0.55      0.529      0.494s
       1728       5000      0.554      0.551       0.53      0.529s
       1760       5000      0.554       0.55      0.529      0.487s
       1792       5000      0.555      0.551       0.53      0.497s
       1824       5000      0.554      0.551       0.53      0.465s
       1856       5000      0.554       0.55      0.529      0.493s
       1888       5000      0.554       0.55      0.529      0.534s
       1920       5000      0.554       0.55      0.529      0.501s
       1952       5000      0.552      0.548      0.528      0.517s
       1984       5000      0.552      0.548      0.527      0.483s
       2016       5000      0.551      0.547      0.526      0.447s
       2048       5000       0.55      0.546      0.525      0.576s
       2080       5000       0.55      0.546      0.525      0.438s
       2112       5000      0.551      0.547      0.526      0.497s
       2144       5000      0.551      0.547      0.526      0.538s
       2176       5000      0.552      0.548      0.526      0.562s
       2208       5000       0.55      0.546      0.525      0.551s
       2240       5000      0.549      0.545      0.524      0.516s
       2272       5000      0.549      0.545      0.524      0.506s
       2304       5000      0.549      0.545      0.524      0.446s
       2336       5000      0.549      0.545      0.523      0.486s
       2368       5000       0.55      0.546      0.524      0.545s
       2400       5000       0.55      0.546      0.524      0.506s
       2432       5000       0.55      0.546      0.524      0.468s
       2464       5000       0.55      0.546      0.525       0.54s
       2496       5000       0.55      0.547      0.525      0.551s
       2528       5000       0.55      0.546      0.525      0.543s
       2560       5000      0.549      0.546      0.524      0.465s
       2592       5000      0.549      0.546      0.524      0.495s
       2624       5000      0.549      0.546      0.524      0.612s
       2656       5000      0.549      0.546      0.525      0.512s
       2688       5000      0.549      0.546      0.524      0.523s
       2720       5000      0.547      0.545      0.523      0.547s
       2752       5000      0.547      0.545      0.523      0.537s
       2784       5000      0.547      0.545      0.523      0.519s
       2816       5000      0.548      0.545      0.524      0.474s
       2848       5000      0.547      0.545      0.523      0.498s
       2880       5000      0.548      0.545      0.523      0.526s
       2912       5000      0.547      0.545      0.523      0.478s
       2944       5000      0.547      0.544      0.523      0.487s
       2976       5000      0.547      0.545      0.523      0.536s
       3008       5000      0.545      0.543      0.521      0.509s
       3040       5000      0.544      0.541       0.52      0.466s
       3072       5000      0.543      0.541      0.519      0.511s
       3104       5000      0.544      0.541       0.52      0.489s
       3136       5000      0.544      0.542       0.52       0.42s
       3168       5000      0.545      0.543      0.521      0.536s
       3200       5000      0.545      0.543      0.521      0.452s
       3232       5000      0.545      0.543      0.521      0.545s
       3264       5000      0.544      0.542       0.52       0.56s
       3296       5000      0.544      0.542       0.52      0.533s
       3328       5000      0.544      0.542       0.52       0.48s
       3360       5000      0.543      0.541      0.519       0.49s
       3392       5000      0.542      0.541      0.519      0.507s
       3424       5000      0.542      0.541      0.519      0.522s
       3456       5000      0.542      0.542       0.52      0.522s
       3488       5000      0.541      0.541      0.519      0.675s
       3520       5000      0.541       0.54      0.518      0.443s
       3552       5000      0.541       0.54      0.518      0.515s
       3584       5000      0.541      0.541      0.519       0.54s
       3616       5000      0.541       0.54      0.518      0.462s
       3648       5000      0.542       0.54      0.519      0.575s
       3680       5000      0.542      0.541      0.519      0.575s
       3712       5000      0.543      0.541      0.519      0.604s
       3744       5000      0.542       0.54      0.519      0.595s
       3776       5000      0.542      0.541      0.519      0.504s
       3808       5000      0.542      0.541      0.519      0.469s
       3840       5000      0.542      0.541      0.519      0.572s
       3872       5000      0.541       0.54      0.518      0.485s
       3904       5000      0.541       0.54      0.518      0.585s
       3936       5000      0.541       0.54      0.518      0.518s
       3968       5000      0.541      0.539      0.517       0.53s
       4000       5000      0.541      0.539      0.517      0.472s
       4032       5000      0.541      0.539      0.517      0.585s
       4064       5000      0.541      0.539      0.517      0.545s
       4096       5000      0.541      0.539      0.518      0.568s
       4128       5000      0.541      0.539      0.517      0.508s
       4160       5000      0.541      0.539      0.517      0.447s
       4192       5000       0.54      0.538      0.516      0.571s
       4224       5000       0.54      0.538      0.516      0.553s
       4256       5000       0.54      0.538      0.517      0.493s
       4288       5000      0.541      0.539      0.517      0.451s
       4320       5000      0.542      0.539      0.517      0.538s
       4352       5000      0.542      0.539      0.518      0.535s
       4384       5000      0.541      0.539      0.517      0.636s
       4416       5000      0.542       0.54      0.518      0.448s
       4448       5000      0.542      0.539      0.518      0.507s
       4480       5000      0.542       0.54      0.518      0.566s
       4512       5000      0.542       0.54      0.518      0.606s
       4544       5000      0.542      0.539      0.518      0.587s
       4576       5000      0.541      0.539      0.517      0.479s
       4608       5000      0.541      0.539      0.517      0.523s
       4640       5000      0.542      0.539      0.517      0.542s
       4672       5000      0.541      0.539      0.517      0.633s
       4704       5000      0.542      0.539      0.517       0.54s
       4736       5000      0.542      0.539      0.517      0.448s
       4768       5000      0.542      0.539      0.517      0.552s
       4800       5000      0.542      0.539      0.518        0.5s
       4832       5000      0.542      0.539      0.517      0.476s
       4864       5000      0.542      0.539      0.518      0.524s
       4896       5000      0.543       0.54      0.518      0.535s
       4928       5000      0.542      0.539      0.518       0.61s
       4960       5000      0.542      0.539      0.518      0.517s
       4992       5000      0.542      0.539      0.517      0.513s
       5000       5000      0.542      0.538      0.517      0.142s
mAP Per Class:
         person: 0.6545
        bicycle: 0.3390
            car: 0.4087
     motorcycle: 0.5551
       airplane: 0.7695
            bus: 0.6553
          train: 0.7421
          truck: 0.3311
           boat: 0.3917
  traffic light: 0.3546
   fire hydrant: 0.6711
      stop sign: 0.5843
  parking meter: 0.2901
          bench: 0.2239
           bird: 0.4490
            cat: 0.6029
            dog: 0.5982
          horse: 0.5965
          sheep: 0.4772
            cow: 0.4832
       elephant: 0.7780
           bear: 0.6631
          zebra: 0.7580
        giraffe: 0.8608
       backpack: 0.1389
       umbrella: 0.4296
        handbag: 0.1174
            tie: 0.4122
       suitcase: 0.2834
        frisbee: 0.4407
           skis: 0.3457
      snowboard: 0.4403
    sports ball: 0.4385
           kite: 0.5387
   baseball bat: 0.3634
 baseball glove: 0.3577
     skateboard: 0.6406
      surfboard: 0.4961
  tennis racket: 0.6215
         bottle: 0.2793
     wine glass: 0.3463
            cup: 0.3451
           fork: 0.2618
          knife: 0.1097
          spoon: 0.1302
           bowl: 0.2915
         banana: 0.2537
          apple: 0.1686
       sandwich: 0.3510
         orange: 0.2597
       broccoli: 0.2330
         carrot: 0.1589
        hot dog: 0.4160
          pizza: 0.5690
          donut: 0.3708
           cake: 0.3449
          chair: 0.2805
          couch: 0.3294
   potted plant: 0.3229
            bed: 0.3898
   dining table: 0.2956
         toilet: 0.6938
             tv: 0.5832
         laptop: 0.5667
          mouse: 0.4683
         remote: 0.2484
       keyboard: 0.4403
     cell phone: 0.2181
      microwave: 0.5130
           oven: 0.3297
        toaster: 0.0714
           sink: 0.4191
   refrigerator: 0.4699
           book: 0.1062
          clock: 0.5111
           vase: 0.3223
       scissors: 0.1696
     teddy bear: 0.5323
     hair drier: 0.0000
     toothbrush: 0.2262

However from COCOAPI the results are shown below

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.100
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.101
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.100
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.064
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.122
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.077
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.117
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.200
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.232
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.308
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.252

So from the test.py it gives mAP 50+ however from COCOAPI it seems like only around 10
Am I missing something or what?
Thank you very much @glenn-jocher sorry for the long results.

@xiao1228 xiao1228 added the bug Something isn't working label Mar 27, 2019
@xiao1228
Copy link
Author

Sorry I think you mentioned in #161 it will be in TODO list to align the results right?
I didn't know the difference is this big.

@glenn-jocher
Copy link
Member

@xiao1228 to compute COCOAPI mAP properly you need to set --conf_thres 0.001 as you can see in the README: https://github.com/ultralytics/yolov3#map

You have yours set to 0.30, which is good for real world results but produces lower mAP.

@glenn-jocher
Copy link
Member

@xiao1228 also note the mAP computation in the current repo is not properly aligned to the COCO metric since it averages the image dimension rather than the class dimension. We have a branch with modifications to align these more closely, you may want to use it instead, or wait a few days untill we merge it with the master branch: https://github.com/ultralytics/yolov3/tree/map_update

@glenn-jocher
Copy link
Member

From #7:

UPDATE: difference narrowed down to 0.531 (repo calculation) vs 0.551 (pycocotools). The obj_conf used affects the mAP: whether it is multiplied by class_conf, and if so whether that class_conf is produced by sigmoid or softmax.

rm -rf yolov3 && git clone -b map_update --depth 1 https://github.com/ultralytics/yolov3 yolov3
python3 test.py --conf-thres 0.001 --save-json
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')

Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
      Image      Total          P          R        mAP
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [07:00<00:00,  2.09s/it]
       5000       5000     0.0865      0.727      0.531

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.308
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.551
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.308
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.455
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.267
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.407
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.432
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.240
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.470
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590

@xiao1228
Copy link
Author

Hi @glenn-jocher
Thank you for the details. I have tested using conf_thres = 0.001
python3 test.py --weights weights/best.pt --save-json --conf-thres 0.001 --img-size 416
And the COCOAPI still gives out 0.162

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.162
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.162
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.161
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.093
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.156
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.093
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.164
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.455
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.613
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.556

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 28, 2019

@xiao1228 did you train to 270 epochs? If not then yes of course your mAP will be lower.

@xiao1228
Copy link
Author

ok. this is after 110 epochs. I will update you then.

@glenn-jocher
Copy link
Member

Oh ok. Still that seems strangely low. We were getting about 0.45 mAP on pycocotools after 70 epochs before, which was the longest we managed to trian. It takes so long to train to 270 that we haven't had time to try.

0.162 at epoch 110 seems too low to me. Are you just running default training from the darknet53 backbone (i.e. all settings default)?

@xiao1228
Copy link
Author

xiao1228 commented Mar 28, 2019

I didnot change anything except I lower down the lr at 50th epoch, like your previous code.

@xiao1228
Copy link
Author

If I dont lower down the lr at 50th epoch, the results at 72th epoch from COCOAPI with conf_thres = 0.001 is below. Not much difference.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.160
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.161
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.160
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.092
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.151
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.096
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.162
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.464
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.614
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.621
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.658
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590

@glenn-jocher
Copy link
Member

@xiao1228 oh boy, you're going down the rabbit hole now.

The question of self-trained mAP is always a big one, especially since we are not sure completely of the optimal loss function to use (if we use the darknet default loss, we see worse results in our tests of the first 10 epochs). I'll see if I can find one of our old checkpoints. This is from release v1.0 (https://github.com/ultralytics/yolov3/releases), which is a bit old now. I think this checkpoint was around epoch 65. I tested it using our map_update branch (https://github.com/ultralytics/yolov3/branches), which we will merge with the master soon. There have been changes of course since v1.0, but these should not change the mAP as much as you are seeing. You've likely altered some other setting that is causing your mAP drop, or perhaps not initialized with the darknet53 backbone.
https://storage.googleapis.com/ultralytics/yolov3/best_v1_0.pt

In this result, test.py natively return 0.41 mAP, and pycocotools returns 0.425 mAP on best_v1_0.pt around epoch 65.

python3 test.py --save-json --weights weights/best_v1_0.pt
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best_v1_0.pt')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_process
or_count=80)
      Image      Total          P          R        mAP
100%|████████████████████████████████████████████████████████████████████████████| 157/157 [07:29<00:00,  2.24s/it]
       5000       5000     0.0653      0.654       0.41

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.213
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.425
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.090
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.220
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.307
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.216
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.339
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.365
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504

@xiao1228
Copy link
Author

First you dont need to worry now. Something wrong on my side. Because I just run with COCOAPI with your best_v1_0.pt with
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best_v1_0.pt')

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.158
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.084
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.164
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.103
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.159
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.431
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.564
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.631
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.489

I did not change any settings...also I initialized with the darknet53 backbone. I can produce around 50 mAP from your test.py for image size 416. But however it seems not aligned with COCOAPI.
I really do not know why just getting 0.16 from the COCOAPI.

@glenn-jocher
Copy link
Member

@xiao1228 ah I see. You may want to clone the repo again and try training without modifications. Though I would use the map_update branch, since it has all the leading changes.

@glenn-jocher
Copy link
Member

@xiao1228 we've made great efforts to align the repo mAP with the COCO mAP. It's not perfect, but the current result seems to steadily track about 2% lower than the COCO mAP as the epochs trend higher.

This plot shows v4.0 training with the repo mAP (blue) overlaid with the pycocotools mAP (orange), using --conf_thres 0.1 and default training settings. We decided to use --conf_thres 0.1 to increase test speed during training, while leaving the test.py default at --conf_thres 0.001 for the highest mAP when run by hand later on.
Screenshot 2019-03-29 at 12 13 14

@xiao1228
Copy link
Author

xiao1228 commented Mar 29, 2019

Hi @glenn-jocher Thank you for the help.
I clone your code again from the master branch and tested your best_v1_0.pt, the results from COCOAPI is shown below, which is still different from what you get. But it is better than your results...

Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best_v1_0.pt')

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.207
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.096
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.230
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.219
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.340
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.366
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504

I also tested the train from scratch one (best.pt) from this branch after 135 epochs results shown below:
which is 0.428

Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/best.pt')

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.428
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.067
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.200
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.358
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.212
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.332
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.360
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.158
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.369
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.539

Also I have tested the model I trained in Feb. Clone around 11st Feb. Results shows below at 70th epoch:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.239
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.460
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.227
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.097
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.247
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.351
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.231
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.357
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.391
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.526

No modification was made from the code that I clone from your repo. Trained using darknet53 backbone for weight initialization.

@glenn-jocher
Copy link
Member

@xiao1228 mAP seems fully aligned now. Latest comparison is 0.55 repo calculation vs 0.549 pycocotools, in map_update branch. See #7 for the latest. Will merge branch later today.

@glenn-jocher
Copy link
Member

Final results are in, and PR #176 complete. Repo mAP now aligns with COCO mAP under most circumstances to within 1%. Also mAP output now exceeds yolov3 darknet published results.

ultralytics/yolov3 with pycocotools darknet/yolov3
YOLOv3-320 51.8 51.5
YOLOv3-416 55.4 55.3
YOLOv3-608 58.2 57.9
sudo rm -rf yolov3 && git clone https://github.com/ultralytics/yolov3
# bash yolov3/data/get_coco_dataset.sh
sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
cd yolov3

python3 test.py --save-json --conf-thres 0.001 --img-size 416
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
      Image      Total          P          R        mAP
Calculating mAP: 100%|█████████████████████████████████| 157/157 [08:34<00:00,  2.53s/it]
       5000       5000     0.0896      0.756      0.555
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.312
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.554
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.317
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.145
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.452
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.268
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.411
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.435
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.244
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.477
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
 
python3 test.py --save-json --conf-thres 0.001 --img-size 608 --batch-size 16
Namespace(batch_size=16, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
      Image      Total          P          R        mAP
Calculating mAP: 100%|█████████████████████████████████| 313/313 [08:54<00:00,  1.55s/it]
       5000       5000     0.0966      0.786      0.579
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.331
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.582
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.344
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.362
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.427
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.437
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.463
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577

@xiao1228
Copy link
Author

hi @glenn-jocher thank you very much for the update for mAP. I wonder the next step will be improve the training from scratch then? I am training from scratch at the moment as well. And will update the results.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 31, 2019

@xiao1228 yes, you are correct. Training from scratch (well, from darknet53 backbone) is the final frontier, not just for this repo but for all object detection.

It's also the source of greatest confusion for me, because if I implement the darknet loss function as I understand it, then I get very poor results. The current loss function is the result of a hyperparameter search I did a few months ago #2 (comment). I run full COCO training for epoch 0, and tweak the parameters to get the highest mAP at the end of epoch 0. BUT this used the old mAP code, so its possible I was optimizing for the wrong metric.

The main areas I was changing was the loss function, though the image augmentation could also be a source of investigation. The current loss function has two very odd weightings: k/4 on CE and k*64 on BCE. The reason they are there is because the results improved significantly with this change. I believe original darknet essentually uses no weights anymore on the loss components, and also uses BCE for class_conf loss, but I can't get good results this way.

If you have access to GPU time, it might be useful to test out various loss function changes on the first few epochs, or on a subset of the COCO dataset, and we could implement the results of the best changes. I was thinking of creating a subset using the first 1000 images, and training and testing on those to more rapidly prototype changes than using the full dataset, which takes a lot of time.

yolov3/utils/utils.py

Lines 264 to 277 in 09b02d2

# Compute losses
k = 1 # nT / bs
if len(b) > 0:
pi = pi0[b, a, gj, gi] # predictions closest to anchors
tconf[b, a, gj, gi] = 1 # conf
lxy += k * MSE(torch.sigmoid(pi[..., 0:2]), txy[i]) # xy loss
lwh += k * MSE(pi[..., 2:4], twh[i]) # wh loss
lcls += (k / 4) * CE(pi[..., 5:], tcls[i]) # class_conf loss
# pos_weight = FT([gp[i] / min(gp) * 4.])
# BCE = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
lconf += (k * 64) * BCE(pi0[..., 4], tconf) # obj_conf loss
loss = lxy + lwh + lconf + lcls

@xiao1228
Copy link
Author

xiao1228 commented Apr 1, 2019

@glenn-jocher At least with the current loss function and default settings, after 196 epochs the mAP is only 0.414. What tweak I should apply on the loss function then?

@glenn-jocher
Copy link
Member

@xiao1228 yes its probably a good idea to stop training then, and to instead do a hyperparameter search. Or you could try with darknet defaults, though as I said those produce worse results for the first 3 epochs. You can see what I was trying before in #2 (comment). But just quickly off the top of my head the possible parameters to vary are here. I'm investigating the first few currently, but you can do this all on your own as well.

  • xy loss constant
  • wh loss constant
  • class_conf loss constant
  • obj_conf loss constant
  • iou rejection constant
  • initial learning rate lr0
  • learning rate scheduler
  • --accumulate batches
  • optimizer (SGD, Adam, etc)
  • --freeze-backbone
  • SV Augmentation (currently set at 50%)
  • Translation, Rotation, Reflection and Zoom Augmentation
  • --multi-scale

@xiao1228
Copy link
Author

Hi @glenn-jocher
This is the results after 269 epochs on the whole COCO dataset
results
The loss function I am using is probably not from your latest commit because I started few days ago. And I did not use the multi-scale

def compute_loss(p, targets):  # predictions, targets
    FT = torch.cuda.FloatTensor if p[0].is_cuda else torch.FloatTensor
    lxy, lwh, lcls, lconf = FT([0]), FT([0]), FT([0]), FT([0])
    txy, twh, tcls, indices = targets
    MSE = nn.MSELoss()
    CE = nn.CrossEntropyLoss()
    BCE = nn.BCEWithLogitsLoss()

    # Compute losses
    # gp = [x.numel() for x in tconf]  # grid points
    for i, pi0 in enumerate(p):  # layer i predictions, i
        b, a, gj, gi = indices[i]  # image, anchor, gridx, gridy
        tconf = torch.zeros_like(pi0[..., 0])  # conf

        # Compute losses
        k = 1  # nT / bs
        if len(b) > 0:
            pi = pi0[b, a, gj, gi]  # predictions closest to anchors
            tconf[b, a, gj, gi] = 1  # conf

            lxy += (k * 8) * MSE(torch.sigmoid(pi[..., 0:2]), txy[i])  # xy loss
            lwh += (k * 4) * MSE(pi[..., 2:4], twh[i])  # wh loss
            lcls += (k * 1) * CE(pi[..., 5:], tcls[i])  # class_conf loss

        # pos_weight = FT([gp[i] / min(gp) * 4.])
        # BCE = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
        lconf += (k * 64) * BCE(pi0[..., 4], tconf)  # obj_conf loss
    loss = lxy + lwh + lconf + lcls

    # Add to dictionary
    d = defaultdict(float)
    losses = [loss.item(), lxy.item(), lwh.item(), lconf.item(), lcls.item()]
    for name, x in zip(['total', 'xy', 'wh', 'conf', 'cls'], losses):
        d[name] = x

    return loss, d

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 15, 2019

@xiao1228 ah excellent! This is the first time I've seen results from full training. A few items:

  • COCO TESTING --conf-thres 0.1 vs 0.001 #214 the plotted mAP is about 5% lower than the true mAP
  • Transfer Learning Performance #211 we have made improvements to the loss function resulting in higher mAP
  • Your code is a bit out of date, you should git pull or git clone again.
  • The latest plots show testing loss, which is also very important to see to make sure that overfitting is not occurring.
  • We have improved the default model from yolov3 to yolov3-SPP, which results in 2-3% higher mAP.
  • Yes multi-scale may help, along with a rash of other small changes that the darknet folks used during training, which you can read about in their 3 publications: https://pjreddie.com/publications/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants