-
论文:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
-
模型代码:swin.py
-
验证集数据处理:
# 图像后端:pil # 输入图像大小:224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # 图像后端:pil # 输入图像大小:384x384 transforms = T.Compose([ T.Resize((384, 384), interpolation='bicubic'), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
模型细节:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model Swin-tiny swin_ti 28 4.5 81.19 95.51 Download Swin-small swin_s 50 8.7 83.18 96.24 Download Swin-base swin_b 88 15.4 83.42 96.45 Download Swin-base-384 swin_b_384 88 47.1 84.47 96.95 Download
-
引用:
@article{liu2021Swin, title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining}, journal={arXiv preprint arXiv:2103.14030}, year={2021} }