Release VMamba v2 Classification checkpoints · MzeroMiko/VMamba

name	pretrain	resolution	acc@1	#params	FLOPs	TP.	Train TP.	configs/logs/ckpts
VMamba-T[`s2l5`]	ImageNet-1K	224x224	82.5	31M	4.9G	1340	464	config/log/ckpt
VMamba-S[`s2l15`]	ImageNet-1K	224x224	83.6	50M	8.7G	877	314	config/log/ckpt
VMamba-B[`s2l15`]	ImageNet-1K	224x224	83.9	89M	15.4G	646	247	config/log/ckpt
VMamba-T[`s1l8`]	ImageNet-1K	224x224	82.6	30M	4.9G	1686	571	config/log/ckpt
VMamba-S[`s1l20`]	ImageNet-1K	224x224	83.3	49M	8.6G	1106	390	config/log/ckpt
VMamba-B[`s1l20`]	ImageNet-1K	224x224	83.8	87M	15.2G	827	313	config/log/ckpt

Models in this subsection is trained from scratch with random or manual initialization. The hyper-parameters are inherited from Swin, except for drop_path_rate and EMA. All models are trained with EMA except for the Vanilla-VMamba-T.
TP.(Throughput) and Train TP. (Train Throughput) are assessed on an A100 GPU paired with an AMD EPYC 7542 CPU, with batch size 128. Train TP. is tested with mix-resolution, excluding the time consumption of optimizers.
FLOPs and parameters are now gathered with head (In previous versions, without head, so the numbers raise a little bit).
we calculate FLOPs with the algorithm @ albertgu provides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algorithm).

Provide feedback