Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train my own dataset #1035

Closed
rimu123 opened this issue Apr 10, 2020 · 11 comments
Closed

Train my own dataset #1035

rimu123 opened this issue Apr 10, 2020 · 11 comments
Labels
enhancement New feature or request Stale Stale and schedule for closing soon

Comments

@rimu123
Copy link

rimu123 commented Apr 10, 2020

Thank you for your work. If I want to train my own dataset, can you give some advice? Because I found that your Hyperparameters are all tuned out through a lot of experiments. should I use your Hyperparameters directly or find my own Hyperparameters . Thank. By the way, why does Conv2d.weight, bias, and bn have different learning strategies?
Looking forward to your reply!

@rimu123 rimu123 added the enhancement New feature or request label Apr 10, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 10, 2020

@rimu123 use the default hyperparameters and settings. Follow the tutorial: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

weights have their own parameter group to which weight decay is applied, biases start from a high LR while all else starts from a low LR. This is part of our training strategy.

@rimu123
Copy link
Author

rimu123 commented Apr 11, 2020

Thank you for your reply! I get it.
I used darkent to train my own dataset and probably got 0.5@mAP=52.7. Using your code, without changing the hidden parameters, the best result I got is about 51 (fine-tuning 100 epoch with pre -trained darknet model). Now I want to search for hidden parameters with --epoch 10 --evolve for 200 times, which takes about ten days, which is crazy. I think I can get better results than the original darkent by manually adjusting the parameters, can you give me some suggestions, in other words, which hidden parameters are more important and should be considered. Thank you!
Looking forward to your reply!

@glenn-jocher
Copy link
Member

@rimu123 yes evolving does take a lot of time unfortunately. Can you post your results.png and test_batch0.png?

@glenn-jocher
Copy link
Member

@rimu123 one main difference is multi-scale training is not enabled in this repo by default. It typically bumps mAP a few percentage points. You can enable it like this. Also for best results, I would try training from yolov3-spp-ultralytics.pt if your dataset is on the smaller side.

python3 train.py --multi

@rimu123
Copy link
Author

rimu123 commented Apr 13, 2020

@glenn-jocher I use my own model, because the input aspect ratio is different, so multi-scale can not be used in training. In darknet, using multi-scale will drop 1,2 points. In addition, data augmentation does not work in my dataset. This picture shows my latest training results using your code. It can be concluded that the total loss has been declining, but the recall and accuracy are showing a downward trend, the same as [email protected] and F1. Does this mean overfitting? Thank you!
results

@glenn-jocher
Copy link
Member

This is a strange result, but yes, when the validation losses increase this indicates overtraining.

@rimu123
Copy link
Author

rimu123 commented Apr 13, 2020

It may be that the learning rate is too high because this is a fine-tuning. In addition, the effect of batchsize on the results will be great. Wait for me to adjust some hidden parameters and share the results with you.

@glenn-jocher
Copy link
Member

@rimu123 ok! If in doubt just use the default parameters. This is how we reproduce our results on COCO:
https://github.com/ultralytics/yolov3#reproduce-our-results

@rimu123
Copy link
Author

rimu123 commented Apr 15, 2020

@glenn-jocher There are two confusions. The first is that the total loss should be shocked. Why is loss always declining slowly in your code? The second is how to get 6300 in utils.py?
if red == 'sum': bs = tobj.shape[0] # batch size lobj *= 3 / (6300 * bs) * 2 #3 / (6300 * bs) * 2 # 3 / np * 2 if ng: lcls *= 3 / ng / model.nc lbox *= 3 / ng

@glenn-jocher
Copy link
Member

@rimu123 this section of the code is not used in practice, since red = 'mean' by default. To verify your labels are accurate your should check your train_batch0.png and test_batch0.png images.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

2 participants