Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Master Issue] Add more models to torchvision #645

Open
6 of 7 tasks
fmassa opened this issue Oct 30, 2018 · 46 comments
Open
6 of 7 tasks

[Master Issue] Add more models to torchvision #645

fmassa opened this issue Oct 30, 2018 · 46 comments
Assignees

Comments

@fmassa
Copy link
Member

fmassa commented Oct 30, 2018

This is a master issue to track requests for adding new pre-trained models to torchvision.

Here is the (potentially incomplete) list I compiled:

@Cadene has already implemented a number of those models in his fantastic https://github.com/Cadene/pretrained-models.pytorch . I'll start from there and try to get models trained using pytorch/examples/imagenet, so that the models are reproducible.


Requirements

  • python implementation to live in vision/models
  • pre-trained weights using the same mean / std normalization as in the imagenet example
  • the script used to train the models, or the command-line arguments used if the script was exactly the one from examples/imagenet.
@gokriznastic
Copy link

gokriznastic commented Nov 4, 2018

@fmassa Hey I would like to try adding some of these models. Can you tell me which ones of these you need help with?

@fmassa
Copy link
Member Author

fmassa commented Nov 6, 2018

@gokriznastic ideally we would want to have not only the model implementation, but also the weights and the training code that was used (if different from pytorch/examples/imagenet).

This way, we have a reproducible way of obtaining the models.

I believe @tonylins will be adding support for MobileNetV2. All the others are open, so if you decide to take one, just let us know :-)

@JuanFMontesinos
Copy link

Hi there, I was wondering if you would be interested in a U-Net model. I've developed a very flexible version which allows variable depth, BN etcetera... I consider it's a very important model nowadays in audio and computer vision

@fmassa
Copy link
Member Author

fmassa commented Dec 5, 2018

Hi @JuanFMontesinos
Definitely! Is this a segmentation model or a classification model?

Because for now, all of the models are for classification tasks, but we would like to extend it to other tasks as well (but it will require some thought so that we have the proper training / evaluation scripts)

@JuanFMontesinos
Copy link

@fmassa it was originally proposed as a segmentation architecture for biomedical applications. It is basically an encoder decoder architecture with skip connections widely used in blind sound source separation when working with spectrograms of the sound. It is also the core of GANs like pix2pix which is an imagr-to-image translation network and many others. That's why I suggested to include it. With respect to train it, it really depends on the application. Do you require a training framework and weights?

Regards

@fmassa
Copy link
Member Author

fmassa commented Dec 6, 2018

Hi @JuanFMontesinos ,

I see. We currently require all models in torchvision to have pre-trained weights, and ideally a code-base where we can train / evaluate on it.
This becomes specially important for some complex models, like detection, where the model alone is generally not enough to be able to use it, and requires a number of helper functions.

@JuanFMontesinos
Copy link

@fmassa Hi, sorry for the late reply. Which task/dataset would you be interested in training U-Net for?

@fmassa
Copy link
Member Author

fmassa commented Dec 11, 2018

U-Net are usually used for segmentation, so I'd say maybe Pascal VOC segmentation task or Cityscapes? But there might be newer benchmarks out there that I'm not aware of

@guanfuchen
Copy link

@fmassa VOC and Cityscapes dataset is large, there is a smaller dataset CamVid consisting 701 labeled images, and if you use U-Net for better performance, I think using the original medical dataset is better. Here is a good project named pytorch-semseg for semantic segmentation reimplementing U-Net.

@fmassa
Copy link
Member Author

fmassa commented Dec 11, 2018

VOC and Cityscapes might be large, but there have been a number of publications using them, and they are widely used in the scientific literature. That's why I think providing pre-trained models for one of those tasks might be relevant.

@JuanFMontesinos
Copy link

Sooo let me evaluate it after christmas to see which dataset would be better

@varunagrawal
Copy link
Contributor

Since we are adding MobileNet, it would be a good idea to add ShuffleNet as well given its improved performance over MobileNet.

@setuc
Copy link

setuc commented Feb 4, 2019

Is there a priority among this list of models? I was planning to train a couple of models on Imagenet datasets from scratch and can contribute here.

Or are we to refer to the models from @Cadene

@IgorKasianenko
Copy link

@setuc I would really appreciate training ShuffleNet. This is small model so I assume it will take least time to start with it.
Sincerely yours
Igor

@fmassa
Copy link
Member Author

fmassa commented Feb 11, 2019

Hi @setuc
Sorry for the delay in replying.

I'd say that you can pick whichever you'd prefer, but maybe ShuffleNet would be indeed easier because it's a small model.

I think Inception V4 might be quite hard to get to the reported accuracies, so maybe just ShuffletNet would be a great start already!

@setuc
Copy link

setuc commented Feb 16, 2019

One more question @fmassa There are supposedly different version of Imagenet. I am currently using the one from Kaggle. I hope that should be sufficient. I have downloaded the images and plan to start the runs over the weekend.

@hendrycks
Copy link
Contributor

hendrycks commented Feb 16, 2019

There are supposedly different version of Imagenet.

Nearly everyone else is using ImageNet 2012 data, and most papers use that for comparisons.

@setuc
Copy link

setuc commented Feb 17, 2019

@hendrycks i guess i was mistaken...the 2015 dataset is the same as that of 2012. I have started the runs and will do some validations before i share the results.Another 24 - 48 hours to completion.

@fmassa
Copy link
Member Author

fmassa commented Feb 18, 2019

@setuc cool! Let me know how it goes, and which training script / hyperparameters you used to train it

@setuc
Copy link

setuc commented Feb 18, 2019

@fmassa I have used the training script from here https://github.com/pytorch/examples/tree/master/imagenet) as it was mentioned in the requirements in the top post. All the HPs remained the same, except batch size, which was changed to 1024.
Unless we are free to play around with the learning rates (Cyclical learning rate etc), which i wasnt sure.

I am unable to reproduce the results for the network (Error 34.5% vs mine 39.811%) from the paper for 3 groups and no shuffle. My results are Acc@1 60.189 Acc@5 82.601

@fmassa
Copy link
Member Author

fmassa commented Feb 18, 2019

@setuc thanks for getting back to me with the results.

I believe we might need to adapt the learning rate / etc in order to reproduce the results for many of those papers.

If you change those, let me know which changes you did, so that we can keep track of all of it and so that I can summarize it afterwards

@setuc
Copy link

setuc commented Feb 23, 2019

Restarting the training. Rewrote the Shufflenet v1 and V2 together with the cyclical learning rate. I think I have it right this time around. Started the training expecting another 72-80 hours to report back.

Edit: The cyclical rates worked. At 120 epochs the results are encouraging. For Shufflenet v2, the Top-1 error is 41.31 compared to 39.70 from the paper.

Edit2: At 220 epochs, the Top-1 error for ShuffleNet v2 is 40.51 compared to 39.70 from the paper.

Edit3: At 272 epochs, the Top-1 error for ShuffleNet v2 is 40.22 compared to 39.70 from the paper.

Edit4: At 320 epochs, the Top-1 error for ShuffleNet v2 is 39.96 compared to 39.70 from the paper.

@fmassa Should I be doing all the groups / scales reported in the paper for v1 and v2?

@setuc
Copy link

setuc commented Feb 26, 2019

@fmassa I have completed about 400 epochs with a Top-1 error of 39.85 compared to 39.70 from the paper. Should this be sufficient?

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Feb 26, 2019

For your reference, I've reproduced shufflenet v1 & v2 at https://github.com/tensorpack/tensorpack/blob/master/examples/ImageNetModels/shufflenet.py .
It follows the paper's schedule (240 epochs without cyclic LR trick) and gets the same accuracy.

@fmassa
Copy link
Member Author

fmassa commented Mar 1, 2019

@setuc awesome! Could you check what @ppwwyyxx has sent to see if there is something else that you could do to get to the last few % so that we match the accuracies?

@setuc
Copy link

setuc commented Mar 2, 2019

@fmassa I going over the code line by line and carrying out my comparisons from @ppwwyyxx. I had written the code from scratch. So checking again to see if I missed anything.

@fmassa
Copy link
Member Author

fmassa commented Mar 6, 2019

@setuc thanks! Did you figure out where the difference was?

@hendrycks
Copy link
Contributor

(I think it is unlikely the community outside FAIR is going to train various ImageNet models in a timely manner, especially big models such as ResNeXt.)

@fmassa
Copy link
Member Author

fmassa commented Mar 18, 2019

@hendrycks I was planning on getting ResNeXt models trained here

@1e100
Copy link
Contributor

1e100 commented Mar 19, 2019

I have an implementation of MNASNet that I could contribute. Any interest from maintainers? It performs pretty well, and I was able to get close to paper numbers with it, at 1.0 depth multiplier, training with SGD+Nesterov. I think it's currently the best "efficient" model out there.

https://arxiv.org/abs/1807.11626

@fmassa
Copy link
Member Author

fmassa commented Mar 22, 2019

Hi @1e100

Sure, it would be awesome to have it! Could you send a PR with it, and also pointing to the training code and hyperparameters that you used to obtain the results?

@1e100
Copy link
Contributor

1e100 commented Mar 23, 2019

Will do. My own training pipeline is far too complicated to be suitable for something like this, so I'll just implement a single-file fast.ai trainer instead, train with it to something close to paper numbers, and then send a PR. In the interest of expediency, I plan to only verify reachable accuracy for depth multiplier 1.0 under this experimental setup.

Let me know if you see any flaws in this plan. Conservative ETA is about 1 week, 90% of which will be GPU time.

@1e100
Copy link
Contributor

1e100 commented Mar 23, 2019

In the interest of not duplicating code, though, it'd be good to know how far along #625 is. MNASNet is basically just a hyperparameter tweak over MobileNetV2 wrt kernel sizes, layer depths, and block depths. In fact I implemented both using the exact same module.

@1e100
Copy link
Contributor

1e100 commented Apr 2, 2019

OK, after some experimentation I got it to train to the following accuracy numbers: loss=1.076, prec@1=73.512, prec@5=91.544. Still not quite paper numbers, but paper numbers seem achievable with more epochs. I'll be putting together a PR later tonight.

@1e100
Copy link
Contributor

1e100 commented Apr 2, 2019

FYI: paper number is 74.0% top1.

@1e100
Copy link
Contributor

1e100 commented Apr 2, 2019

MNASNet: #829
Trainer: https://github.com/1e100/mnasnet_trainer/tree/master

@fmassa
Copy link
Member Author

fmassa commented Apr 2, 2019

Awesome, thanks @1e100 !

I'll check your code and integrate it into references/classification later this week

@setuc
Copy link

setuc commented Apr 6, 2019

@setuc thanks! Did you figure out where the difference was?

@fmassa I tried doing comparison and ran it a couple of more times. Unfortunately, I dont quite have the paper numbers. The best Top-1 error for ShuffleNet v2 is 39.89 compared to 39.70 from the paper. Will that be sufficient for the pull request?

@soumith
Copy link
Member

soumith commented Apr 6, 2019

@setuc 39.89 vs 39.70 sounds close enough. that would be sufficient for sure.

@D-X-Y
Copy link

D-X-Y commented May 5, 2019

@setuc Would you mind to share your training scripts for ShuffleNet-V2? I tried to use the ResNet training scripts but get a very low accuracy.

@stigma0617
Copy link

stigma0617 commented Jul 4, 2019

@fmassa Hi,

Can I upload PR about VoVNet?

The VoVNet was trained in same manners with pytorch/vision style.

To briefly describe VoVNet,

VoVNet is more efficient backbone network than ResNet & DenseNet in terms of GPU-computation and energy.

I implemented VoVNet classification models and maskrcnn-benchmark models.

classification models : https://github.com/stigma0617/VoVNet.pytorch
maskrcnn-benchmark models : https://github.com/stigma0617/maskrcnn-benchmark-vovnet/tree/vovnet

@fmassa
Copy link
Member Author

fmassa commented Jul 4, 2019

Hi @stigma0617

I think for now it might be better to look into publishing it to torchhub, as it's a very recent paper?

@erichhhhho
Copy link

@fmassa Hi, may I ask if the resnet101 group norm pretrained on Pytorch is available now?

@fmassa
Copy link
Member Author

fmassa commented Aug 5, 2019

@erichhhhho not in torchvision, as IIRC it doesn't bring performance improvements over the batch norm version.

@edsgerls
Copy link

Hi @fmassa

Would it be possible to add the ShuffleNet v2 x1.5 pretrained model, please? I would really appreciate it.

@wangg12
Copy link
Contributor

wangg12 commented Feb 22, 2021

Would you like to add ResNeSt models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests