Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to torchvision models training files? #615

Closed
arnaghizadeh opened this issue Oct 2, 2018 · 14 comments
Closed

Access to torchvision models training files? #615

arnaghizadeh opened this issue Oct 2, 2018 · 14 comments

Comments

@arnaghizadeh
Copy link

arnaghizadeh commented Oct 2, 2018

Hi, In the torchvision models https://github.com/pytorch/vision/tree/master/torchvision/models, we can see the models, we also have the option to download pretrained models, however, what is missing (if I'm not mistaken) is the training files that uses those models. I would really want to have access to those files (specially densenet) so I can reproduce pretrained models myself. Is it possible to have access to these files or at least the exact configurations(e.g, learning rates, epochs, etc) of the training files?

@sotte
Copy link
Contributor

sotte commented Oct 2, 2018

I also think that it would be great if the training procedure was part of the repository. Here are a few advantages I see:

  • people could learn from that code,
  • the training procedure would be transparent, and
  • it's easy to add new models reusing the training procedure (pytorch is starting to lag behind with the models they offer compared to TF and especially TensorFlow Hub).

@soumith
Copy link
Member

soumith commented Oct 3, 2018

@arnaghizadeh
Copy link
Author

arnaghizadeh commented Oct 3, 2018

@soumith thanks for the info, this is used for all datasets (including cifar10 cifar100) without changing any default values for those datasets?

@soumith
Copy link
Member

soumith commented Oct 3, 2018

@arnaghizadeh we dont provide cifar10/100 pre-trained models

@arnaghizadeh
Copy link
Author

arnaghizadeh commented Oct 3, 2018

@soumith oh I see based on your comment in the code "If True, returns a model pre-trained on ImageNet" the option is only for imagenet. However, your documentation I think should be a little more clear and emphasize that this feature is exclusive for imagenet, I automatically supposed that this feature supports all major datasets. https://pytorch.org/docs/0.4.0/torchvision/models.html?highlight=densenet.
In the comments here https://github.com/pytorch/examples/tree/master/imagenet you mentions that:
"The default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs. This is appropriate for ResNet and models with batch normalization, but too high for AlexNet and VGG. Use 0.01 as the initial learning rate for AlexNet or VGG:"
which covers only three methods, what about others like densenet? Shouldn't we change anything?

@fmassa
Copy link
Member

fmassa commented Oct 30, 2018

@sotte I agree with you that all models should have a clearly specified training procedure.
We will be adding more models into torchvision very soon, see #645 , and I'm starting to wonder if the default training schedules / procedures from examples/imagenet will be enough to reproduce all the results.

Any suggestions on how to include this training information is more than welcome

@sotte
Copy link
Contributor

sotte commented Oct 30, 2018

@fmassa Great, I'm very happy to see #645!

Option 1
Assuming you want to stick to the examples/imagenet schema to create weights for models just having simple bash scripts that specify the train parameters would be great:
examples/imagenet/train_resnet18.sh, examples/iamgenet/train_resnet50.sh, and so on. I actually would be very interested in the train parameters :)

Option 2
Assuming that the current examples/imagenet schema is not enough to train new models (that's actually what I assume is the case), one could create examples/torchvision_model_training/<my_model>/ (or maybe a sep. repo?) and fully specify the training procedure in there.

Independent on the two options, https://pytorch.org/docs/stable/torchvision/models.html should mention how models are trained. I can submit a PR for this if you want and we decided on how to proceed with the training procedure setup.

@fmassa
Copy link
Member

fmassa commented Oct 30, 2018

@sotte I totally agree with you, and I actually think that option 2 will be the way to go.

It would probably involve having a separate repo which would contain all the training logic, with a set of configuration files (maybe à-la https://github.com/facebookresearch/maskrcnn-benchmark/tree/master/configs) that entirely specify how to train a model. This way, we have a simple and reproducible way of obtaining the models.

cc @bermanmaxim @soumith for feedback

@bermanmaxim
Copy link

@fmassa I agree, in fact I was myself recently wondering about the training specifications of the pretrained models. I think following yaml configs is a good idea. As we improve the training and reproductibility, we can think about serializing other information that is not shipped with the models currently but would be useful or important to have (like the optimizer state_dict at the end of the training).

@sotte
Copy link
Contributor

sotte commented Nov 12, 2018

Can we do anything to help out and speed up the process?

@fmassa
Copy link
Member

fmassa commented Nov 23, 2018

Hi @sotte

Sorry for the delay in replying, I was pretty busy with other projects.

Yes, having some help would be awesome!

I discussed with @soumith about this some time ago, and he mentioned that the best would be to have, for each model:

  • the training file that was used (if different from examples/imagenet)
  • the command-line arguments that were used to train

For most models, it all boils down to examples/imagenet, with pretty much default command-line arguments I believe. The exception is that for ResNets we recompute the batch norm statistics after training is over.

So, if someone could start organizing such a structure, potentially in a scripts subfolder in the torchvision repo, like vision/scripts, with maybe subrepositories like classification/imagenet, and copy the imagenet example there as a starting point, and add a few bash scripts with the command line argumetns to train those models, I think it would already be awesome!

And this would as well open the door to adding new tasks into torchvision, which is currently very classification-based.

Thoughts?

@Froskekongen
Copy link

@fmassa: If you can provide the actual scripts used, it would be easier to refactor into components/scripts suitable for the torchvision repo.

@fmassa
Copy link
Member

fmassa commented Feb 11, 2019

@Froskekongen for now, most of the trainings have been done with a variant of https://github.com/pytorch/examples/blob/master/imagenet/main.py

@fmassa
Copy link
Member

fmassa commented May 24, 2019

We now provide reference training scripts for classification, detection and segmentation under the references/ folder in torchvision.
I will be adding the command-line arguments used for training those models in a follow-up PR

@fmassa fmassa closed this as completed May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants