Yolov5s model converging differently between latest yolov5 and dated yolov5 (5 months) #7027

saumitrabg · 2022-03-17T20:32:23Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I have tried the same dataset with both yolov5s and yolov5m. My mAP scores are not converging as good as it used to with the new code. Did I miss any tuned parameters?

Additional

No response

glenn-jocher · 2022-03-17T20:33:54Z

@saumitrabg 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

✅ Minimal – Use as little code as possible to produce the problem
✅ Complete – Provide all parts someone else needs to reproduce the problem
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

✅ Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
✅ Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

saumitrabg · 2022-03-18T04:43:16Z

FWIW-I see the difference in hyperparameters between my old version and new version of YOLOv5.

Old:
ubuntu@:/yolov5/data/hyps$ grep lrf *
hyp.finetune.yaml:lrf: 0.12
hyp.finetune_objects365.yaml:lrf: 0.17
hyp.scratch-p6.yaml:lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
hyp.scratch.yaml:lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
ubuntu@ip-172-31-18-207:/yolov5/data/hyps$ grep SGD *
hyp.scratch-p6.yaml:lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
hyp.scratch-p6.yaml:momentum: 0.937 # SGD momentum/Adam beta1
hyp.scratch.yaml:lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
hyp.scratch.yaml:momentum: 0.937 # SGD momentum/Adam beta1
ubuntu@ip-172-31-18-207:~/yolov5/data/hyps$

New:
ubuntu@:/yolov5/data/hyps$ grep lrf *
hyp.Objects365.yaml:lrf: 0.17
hyp.VOC.yaml:lrf: 0.15135
hyp.scratch-high.yaml:lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
hyp.scratch-low.yaml:lrf: 0.01 # final OneCycleLR learning rate (lr0 * lrf)
hyp.scratch-med.yaml:lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
ubuntu@ip-172-31-17-53:/yolov5/data/hyps$ grep SGD *
hyp.scratch-high.yaml:lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
hyp.scratch-high.yaml:momentum: 0.937 # SGD momentum/Adam beta1
hyp.scratch-low.yaml:lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
hyp.scratch-low.yaml:momentum: 0.937 # SGD momentum/Adam beta1
hyp.scratch-med.yaml:lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
hyp.scratch-med.yaml:momentum: 0.937 # SGD momentum/Adam beta1
ubuntu@ip-172-31-17-53:~/yolov5/data/hyps$

glenn-jocher · 2022-03-18T10:57:20Z

@saumitrabg judging by your results it seems pretty apparent that if all else is equal the better performing model simply started from pretrained weights and the lower one didnt.

saumitrabg · 2022-03-19T00:07:02Z

@glenn-jocher yes, the same datasets. The yolov5m model was trained with same real datasets with a base model that was trained on bunch of synthetic data. This is the same mAP curve with yolov5s model with the new code as well- exact same datasets. However, with the older yolo code, we do the same- train wtih real datasets with a base model that was trained on bunch of synthetic data and they get to much higher mAP score after 300 epochs. We like to ideally keep adding the new incremental data to the previous model but as yolo code changes, that is not possible. So, we are keeping version of datasets where 1st model gets trained on the default yolo model that comes in the repo. What else can we exolore? If you look at the charts, the new models off new yolo code don't go beyond 0.2 mAP score even after training on the same datasets for higher epochs. That didn't happen with the older code. Other thing is that x/lr0 curve is very different with the AI model with new yolo code- it is alwasy a straightline now.

glenn-jocher · 2022-03-19T00:31:56Z

@saumitrabg I would put any differences down to your implementation or user error. All YOLOv5 models are trained on COCO from scratch on each release and results improve slightly in most cases.

saumitrabg · 2022-03-19T06:44:55Z

@glenn-jocher Thanks. To make sure there is no user error, we went back and redownloaded the yolov5 git and retrained but it is still showing the same. We have been training yolov5 models for 1.5 years now (they are really great) and it is quite simple actually- just change the coco128.yaml file with the corresponding train/val datasets, pick the right rect size 640 and things have worked great. Also, our coco128.yaml file was old and had different format for train/val datasets and we fixed that to make sure we start with a clean slate. Not much progress. We will keep inspecting if there is any user error though there is no much needed to train.

glenn-jocher · 2022-03-19T20:30:15Z

@saumitrabg got it. We are training multiple models (i.e. 8+ models in parallel right now) across COCO and VOC both from scratch (COCO) and from pretrained (VOC) as part of our normal R&D, and both are operating and training correctly, so I don't see any sign of training issues today.

saumitrabg · 2022-03-28T00:54:34Z

@glenn-jocher any clue how we can make progress? See, how we get higher mAP score with the older yolov5 default weights. A few things:

We are running the models in production and for now, we are stuck in the old dated yolo version because the new yolov5 is showing a low mAP score. Is there a way to put version control so that when we try to use a default yolov5m model, it doesn't try to bring a new baseline, which is not backward compatible?
We are doing the exact same thing with both new and old yolov5 and our mAP scores are much lower. Any idea how we get unblocked?

glenn-jocher · 2022-03-28T08:50:26Z

@saumitrabg the only thing I can think of is an AutoAnchor bug which was resolved last week. See #7067 and #7060

If you could provide a fully reproducible example of what you are seeing then we could start debugging it, but lacking that there is nothing for us to do. A reproducible example would be one data.yaml with autodownload capability and two branches that you say perform very differently.

git clone https://github.com/ultralytics/yolov5 yolov5-1 -b BRANCH1
cd yolov5-1
python train.py --data DATA.yaml

cd ..
git clone https://github.com/ultralytics/yolov5 yolov5-2 -b BRANCH2
cd yolov5-2
python train.py --data DATA.yaml

saumitrabg · 2022-03-28T20:47:12Z

@glenn-jocher sure-we will provide the debug information (2 yolo snapshots). Do you recommend a particular branch from December, 2021 that we can use?

glenn-jocher · 2022-03-28T22:07:09Z

@saumitrabg well if you're saying v5.0 and master are producing different results then:

git clone https://github.com/ultralytics/yolov5 yolov5-1 -b v5.0
cd yolov5-1
python train.py --data DATA.yaml

cd ..
git clone https://github.com/ultralytics/yolov5 yolov5-2 -b master
cd yolov5-2
python train.py --data DATA.yaml

saumitrabg · 2022-03-30T17:24:12Z

@glenn-jocher We confirmed that your v6.0 branch works well while the master or even the 2-month old branch (tests/aws) don't work with default settings. We will stay on v6.0 for now and will move from the small to medium AI weights, however, like to understand what you need from us to help debug this info. All are trained on medium weights and with our data/4x T4 GPUs, it takes 50-60 hrs to train. The red line- 1st v6.0 model- had mAP go to 0 after 48th epoch and hence, we restarted the 2nd v6.0 model using 48th best.pt as a baseline and assume that they are the same continuation.

A few other things that I saw:

x/lr0 curve is always a straightline with the latest master while all my previous successful runs (older yolov5 or v6.0) have the curve to it.
I also see NMS threshold being during training with the latest master or tests/aws.

glenn-jocher · 2022-03-30T21:37:46Z

@saumitrabg is this just on your dataset? If you train coco128.yaml to 300 epochs do you see the same performance on both branches?

saumitrabg · 2022-03-30T21:51:30Z

@glenn-jocher we have only tried on our datasets since we are building custom AI models. Regardless of the number of epochs, the master branch performs worse from get-go.

glenn-jocher · 2022-03-30T22:11:14Z

@saumitrabg we need to be able to reproduce this ourselves, otherwise there is nothing for us to investigate. For example v6.0 and v6.1 model official records are here, and you can see near identical performance across all 10 YOLOv5 models on the COCO dataset between the two versions:

saumitrabg · 2022-03-31T02:55:57Z

@glenn-jocher if that was your conclusion, we should not have been told to reproduce between v6.0 and master latest : -).

glenn-jocher · 2022-03-31T10:15:11Z

@saumitrabg yes it's good you've confirmed a difference, but for us to investigate we need to be able to reproduce the difference ourselves, i.e. we would need your dataset and your data.yaml so we can run your same command and then try to figure out where the differences are originating from.

It seems the differences appear in less than 10 epochs, so it shouldn't take long, we just need your dataset, or any other dataset that you see is also producing the same behavior.

github-actions · 2022-05-01T00:26:53Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

TimbusCalin · 2022-10-24T15:19:20Z

This also happened in my case, @saumitrabg is there any solution you used to tackle the problem? Thank you.

glenn-jocher · 2022-10-24T20:12:30Z

@TimbusCalin 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

✅ Minimal – Use as little code as possible to produce the problem
✅ Complete – Provide all parts someone else needs to reproduce the problem
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

✅ Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
✅ Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

saumitrabg · 2022-10-25T03:22:26Z

We did 2 things:1. Went back to v6.0 and kept our Yolov5 code locked at the version2. I believe, we changed lr0 in hyperparameters. Make it lower to see if that helps. Sent from Yahoo Mail on Android On Mon, Oct 24, 2022 at 1:12 PM, Glenn ***@***.***> wrote: @TimbusCalin 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀 . We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem. How to create a Minimal, Reproducible Example When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be: - ✅ Minimal – Use as little code as possible to produce the problem - ✅ Complete – Provide all parts someone else needs to reproduce the problem - ✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem For Ultralytics to provide assistance your code should also be: - ✅ Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master. - ✅ Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️ . If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem. Thank you! 😃 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

saumitrabg added the question Further information is requested label Mar 17, 2022

github-actions bot added the Stale Stale and schedule for closing soon label May 1, 2022

github-actions bot closed this as completed May 7, 2022

This was referenced May 9, 2022

both mAP and "val" loss are low #7713

Closed

nan for obj, cls and 0 for P/R/mAP #7740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yolov5s model converging differently between latest yolov5 and dated yolov5 (5 months) #7027

Yolov5s model converging differently between latest yolov5 and dated yolov5 (5 months) #7027

saumitrabg commented Mar 17, 2022

glenn-jocher commented Mar 17, 2022 •

edited

Loading

saumitrabg commented Mar 18, 2022 •

edited

Loading

glenn-jocher commented Mar 18, 2022

saumitrabg commented Mar 19, 2022 •

edited

Loading

glenn-jocher commented Mar 19, 2022 •

edited

Loading

saumitrabg commented Mar 19, 2022 •

edited

Loading

glenn-jocher commented Mar 19, 2022 •

edited

Loading

saumitrabg commented Mar 28, 2022

glenn-jocher commented Mar 28, 2022 •

edited

Loading

saumitrabg commented Mar 28, 2022

glenn-jocher commented Mar 28, 2022

saumitrabg commented Mar 30, 2022 •

edited

Loading

glenn-jocher commented Mar 30, 2022

saumitrabg commented Mar 30, 2022

glenn-jocher commented Mar 30, 2022

saumitrabg commented Mar 31, 2022

glenn-jocher commented Mar 31, 2022

github-actions bot commented May 1, 2022 •

edited by glenn-jocher

Loading

TimbusCalin commented Oct 24, 2022

glenn-jocher commented Oct 24, 2022 •

edited

Loading

saumitrabg commented Oct 25, 2022 via email

Yolov5s model converging differently between latest yolov5 and dated yolov5 (5 months) #7027

Yolov5s model converging differently between latest yolov5 and dated yolov5 (5 months) #7027

Comments

saumitrabg commented Mar 17, 2022

Search before asking

Question

Additional

glenn-jocher commented Mar 17, 2022 • edited Loading

How to create a Minimal, Reproducible Example

saumitrabg commented Mar 18, 2022 • edited Loading

glenn-jocher commented Mar 18, 2022

saumitrabg commented Mar 19, 2022 • edited Loading

glenn-jocher commented Mar 19, 2022 • edited Loading

saumitrabg commented Mar 19, 2022 • edited Loading

glenn-jocher commented Mar 19, 2022 • edited Loading

saumitrabg commented Mar 28, 2022

glenn-jocher commented Mar 28, 2022 • edited Loading

saumitrabg commented Mar 28, 2022

glenn-jocher commented Mar 28, 2022

saumitrabg commented Mar 30, 2022 • edited Loading

glenn-jocher commented Mar 30, 2022

saumitrabg commented Mar 30, 2022

glenn-jocher commented Mar 30, 2022

saumitrabg commented Mar 31, 2022

glenn-jocher commented Mar 31, 2022

github-actions bot commented May 1, 2022 • edited by glenn-jocher Loading

TimbusCalin commented Oct 24, 2022

glenn-jocher commented Oct 24, 2022 • edited Loading

How to create a Minimal, Reproducible Example

saumitrabg commented Oct 25, 2022 via email

glenn-jocher commented Mar 17, 2022 •

edited

Loading

saumitrabg commented Mar 18, 2022 •

edited

Loading

saumitrabg commented Mar 19, 2022 •

edited

Loading

glenn-jocher commented Mar 19, 2022 •

edited

Loading

saumitrabg commented Mar 19, 2022 •

edited

Loading

glenn-jocher commented Mar 19, 2022 •

edited

Loading

glenn-jocher commented Mar 28, 2022 •

edited

Loading

saumitrabg commented Mar 30, 2022 •

edited

Loading

github-actions bot commented May 1, 2022 •

edited by glenn-jocher

Loading

glenn-jocher commented Oct 24, 2022 •

edited

Loading