Access to torchvision models training files? #615

arnaghizadeh · 2018-10-02T05:12:06Z

Hi, In the torchvision models https://github.com/pytorch/vision/tree/master/torchvision/models, we can see the models, we also have the option to download pretrained models, however, what is missing (if I'm not mistaken) is the training files that uses those models. I would really want to have access to those files (specially densenet) so I can reproduce pretrained models myself. Is it possible to have access to these files or at least the exact configurations(e.g, learning rates, epochs, etc) of the training files?

sotte · 2018-10-02T14:58:16Z

I also think that it would be great if the training procedure was part of the repository. Here are a few advantages I see:

people could learn from that code,
the training procedure would be transparent, and
it's easy to add new models reusing the training procedure (pytorch is starting to lag behind with the models they offer compared to TF and especially TensorFlow Hub).

soumith · 2018-10-03T12:47:54Z

the training file is https://github.com/pytorch/examples/tree/master/imagenet

arnaghizadeh · 2018-10-03T17:26:45Z

@soumith thanks for the info, this is used for all datasets (including cifar10 cifar100) without changing any default values for those datasets?

soumith · 2018-10-03T18:07:26Z

@arnaghizadeh we dont provide cifar10/100 pre-trained models

arnaghizadeh · 2018-10-03T18:36:06Z

@soumith oh I see based on your comment in the code "If True, returns a model pre-trained on ImageNet" the option is only for imagenet. However, your documentation I think should be a little more clear and emphasize that this feature is exclusive for imagenet, I automatically supposed that this feature supports all major datasets. https://pytorch.org/docs/0.4.0/torchvision/models.html?highlight=densenet.
In the comments here https://github.com/pytorch/examples/tree/master/imagenet you mentions that:
"The default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs. This is appropriate for ResNet and models with batch normalization, but too high for AlexNet and VGG. Use 0.01 as the initial learning rate for AlexNet or VGG:"
which covers only three methods, what about others like densenet? Shouldn't we change anything?

fmassa · 2018-10-30T10:37:26Z

@sotte I agree with you that all models should have a clearly specified training procedure.
We will be adding more models into torchvision very soon, see #645 , and I'm starting to wonder if the default training schedules / procedures from examples/imagenet will be enough to reproduce all the results.

Any suggestions on how to include this training information is more than welcome

sotte · 2018-10-30T17:17:01Z

@fmassa Great, I'm very happy to see #645!

Option 1
Assuming you want to stick to the examples/imagenet schema to create weights for models just having simple bash scripts that specify the train parameters would be great:
examples/imagenet/train_resnet18.sh, examples/iamgenet/train_resnet50.sh, and so on. I actually would be very interested in the train parameters :)

Option 2
Assuming that the current examples/imagenet schema is not enough to train new models (that's actually what I assume is the case), one could create examples/torchvision_model_training/<my_model>/ (or maybe a sep. repo?) and fully specify the training procedure in there.

Independent on the two options, https://pytorch.org/docs/stable/torchvision/models.html should mention how models are trained. I can submit a PR for this if you want and we decided on how to proceed with the training procedure setup.

fmassa · 2018-10-30T17:26:39Z

@sotte I totally agree with you, and I actually think that option 2 will be the way to go.

It would probably involve having a separate repo which would contain all the training logic, with a set of configuration files (maybe à-la https://github.com/facebookresearch/maskrcnn-benchmark/tree/master/configs) that entirely specify how to train a model. This way, we have a simple and reproducible way of obtaining the models.

cc @bermanmaxim @soumith for feedback

bermanmaxim · 2018-10-30T18:59:34Z

@fmassa I agree, in fact I was myself recently wondering about the training specifications of the pretrained models. I think following yaml configs is a good idea. As we improve the training and reproductibility, we can think about serializing other information that is not shipped with the models currently but would be useful or important to have (like the optimizer state_dict at the end of the training).

sotte · 2018-11-12T20:08:54Z

Can we do anything to help out and speed up the process?

fmassa · 2018-11-23T10:21:25Z

Hi @sotte

Sorry for the delay in replying, I was pretty busy with other projects.

Yes, having some help would be awesome!

I discussed with @soumith about this some time ago, and he mentioned that the best would be to have, for each model:

the training file that was used (if different from examples/imagenet)
the command-line arguments that were used to train

For most models, it all boils down to examples/imagenet, with pretty much default command-line arguments I believe. The exception is that for ResNets we recompute the batch norm statistics after training is over.

So, if someone could start organizing such a structure, potentially in a scripts subfolder in the torchvision repo, like vision/scripts, with maybe subrepositories like classification/imagenet, and copy the imagenet example there as a starting point, and add a few bash scripts with the command line argumetns to train those models, I think it would already be awesome!

And this would as well open the door to adding new tasks into torchvision, which is currently very classification-based.

Thoughts?

Froskekongen · 2019-02-07T15:42:47Z

@fmassa: If you can provide the actual scripts used, it would be easier to refactor into components/scripts suitable for the torchvision repo.

fmassa · 2019-02-11T13:50:55Z

@Froskekongen for now, most of the trainings have been done with a variant of https://github.com/pytorch/examples/blob/master/imagenet/main.py

fmassa · 2019-05-24T09:12:22Z

We now provide reference training scripts for classification, detection and segmentation under the references/ folder in torchvision.
I will be adding the command-line arguments used for training those models in a follow-up PR

fmassa closed this as completed May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to torchvision models training files? #615

Access to torchvision models training files? #615

arnaghizadeh commented Oct 2, 2018 •

edited

Loading

sotte commented Oct 2, 2018

soumith commented Oct 3, 2018

arnaghizadeh commented Oct 3, 2018 •

edited

Loading

soumith commented Oct 3, 2018

arnaghizadeh commented Oct 3, 2018 •

edited

Loading

fmassa commented Oct 30, 2018

sotte commented Oct 30, 2018

fmassa commented Oct 30, 2018

bermanmaxim commented Oct 30, 2018

sotte commented Nov 12, 2018

fmassa commented Nov 23, 2018

Froskekongen commented Feb 7, 2019

fmassa commented Feb 11, 2019

fmassa commented May 24, 2019

Access to torchvision models training files? #615

Access to torchvision models training files? #615

Comments

arnaghizadeh commented Oct 2, 2018 • edited Loading

sotte commented Oct 2, 2018

soumith commented Oct 3, 2018

arnaghizadeh commented Oct 3, 2018 • edited Loading

soumith commented Oct 3, 2018

arnaghizadeh commented Oct 3, 2018 • edited Loading

fmassa commented Oct 30, 2018

sotte commented Oct 30, 2018

fmassa commented Oct 30, 2018

bermanmaxim commented Oct 30, 2018

sotte commented Nov 12, 2018

fmassa commented Nov 23, 2018

Froskekongen commented Feb 7, 2019

fmassa commented Feb 11, 2019

fmassa commented May 24, 2019

arnaghizadeh commented Oct 2, 2018 •

edited

Loading

arnaghizadeh commented Oct 3, 2018 •

edited

Loading

arnaghizadeh commented Oct 3, 2018 •

edited

Loading