Hashmat Shadab Malik,
Fahad Shamshad,
Muzammal Naseer,
Karthik Nandakumar,
Fahad Khan,
and
Salman Khan
MBZUAI, UAE.
Official PyTorch implementation
- (September 17, 2024)
- Updated the report: Added results obtained on MambaVision family of models, along with model calibration results.
- (June 14, 2024)
- Code for robust evaluation of models is released.
Abstract: Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In this work, we present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios, including occlusions, image structure, common corruptions, and adversarial attacks, and compare their performance to well-established architectures such as transformers and Convolutional Neural Networks. Furthermore, we investigate the resilience of VSSMs to object-background compositional changes on sophisticated benchmarks designed to test model performance in complex visual scenes. We also assess their robustness on object detection and segmentation tasks using corrupted datasets that mimic real-world scenarios. To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations. Our findings highlight the strengths and limitations of VSSMs in handling complex visual corruptions, offering valuable insights for future research and improvements in this promising field.
- Installation
- Available Models
- Robustness against Adversarial attacks
- Robustness against Information Drop
- Robustness against ImageNet corruptions
- Robustness evaluation for Object Detection
- BibTeX
- Contact
- References
conda create -n mamba_robust
conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r req.txt
cd kernels/selective_scan && pip install .
Model | Tiny | Small | Base |
---|---|---|---|
VMamba (v0) | vssm_tiny_v0 |
vssm_small_v0 |
vssm_base_v0 |
VMamba (v2) | vssm_tiny_v2 |
vssm_small_v2 |
vssm_base_v2 |
Vision Transformer | vit_tiny_patch16_224 |
vit_small_patch16_224 |
vit_base_patch16_224 |
Swin Transformer | swin_tiny_patch4_window7_224 |
swin_small_patch4_window7_224 |
swin_base_patch4_window7_224 |
ConvNext | convnext_tiny |
convnext_small |
convnext_base |
ResNet: resnet18, resnet50
VGG: vgg16_bn, vgg19_bn
Download VMamba ImageNet pre-trained weights and put them in pretrained_weights
folder.
Download pre-trained weights for object detectors (Link) and segmentation networks (Link).
For crafting adversarial examples using Fast Gradient Sign Method (FGSM) at perturbation budget of 8/255, run:
cd classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm --source_model_name <model_name> --epsilon 8
For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:
cd classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd --source_model_name <model_name> --epsilon 8 --attack_steps 20
Other available attacks: bim, mifgsm, difgsm, tpgd, tifgsm, vmifgsm
The results will be saved in AdvExamples_results
folder with the following structure: AdvExamples_results/pgd_eps_{eps}_steps_{step}/{source_model_name}/accuracy.txt
For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:
cd classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve low
For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:
cd classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve high
The results will be saved in AdvExamples_freq_results
folder.
Run the below script to evaluate the robustness across different models against low and high frequency attacks at various perturbation budgets:
cd classification/
bash scripts/get_adv_freq_results.sh <DATA_PATH> <ATTACK_NAME> <BATCH_SIZE>
For evaluating transferability of adversarial examples, first save the generated adversarial examples by running:
cd classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm --source_model_name <model_name> --epsilon 8 --save_results_only False
The adversarial examples will be saved in AdvExamples
folder with the following structure: AdvExamples/{attack_name}_eps_{eps}_steps_{step}/{source_model_name}/images_labels.pt
Then run the below script to evaluate transferability of the generated adversarial examples across different models:
cd classification/
python inference.py --dataset imagenet_adv --data_dir <path to adversarial dataset> --batch_size <> --source_model_name <model name>
--source_model_name
: name of the model on which the adversarial examples will be evaluated
Furthermore, bash scripts are provided to evaluate transferability of adversarial examples across different models:
cd classification/
# Generate adversarial examples
bash scripts/gen_adv_examples.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>
# Evaluate transferability of adversarial examples saved in AdvExamples folder
bash scripts/evaluate_transferability.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>
Run the below script to evaluate the robustness of all the models against information drop along scanning lines:
cd classification/
bash scripts/scan_line_info_drop.sh <DATA_PATH> <EXP_NUM> <PATCH_SIZE>
<DATA_PATH>
: path to the dataset and <PATCH_SIZE>
: number of patches the image is divided into. <EXP_NUM>
:
- 1: linearly increasing the amount of information dropped in each patch along the scanning direction.
- 2: Increasing the amount of information dropped in each patch with maximum at center of the scanning direction.
- 3: Decreasing the amount of information dropped in each patch with minimum at center of the scanning direction.
- 4: Sequentially dropping patches along the scanning directions.
Run the below script to evaluate the robustness of all the models against random drop of patches:
cd classification/
bash scripts/random_patch_drop.sh <DATA_PATH> <PATCH_SIZE>
<DATA_PATH>
: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into.
Run the below script to evaluate the robustness of all the models against random drop of patches:
cd classification/
bash scripts/salient_drop.sh <DATA_PATH> <PATCH_SIZE>
<DATA_PATH>
: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into.
Run the below script to evaluate the robustness of all the models against random drop of patches:
cd classification/
bash scripts/shuffle_image.sh <DATA_PATH>
<DATA_PATH>
: path to the dataset
- ImageNet-B (Object-to-Background Compositional Changes) (Link)
- ImageNet-E (Attribute Editing) (Link)
- ImageNet-V2 (Link)
- ImageNet-A (Natural Adversarial Examples) (Link)
- ImageNet-R (Rendition) (Link)
- ImageNet-S (Sketch) (Link)
- ImageNet-C (Common Corruptions) (Link)
For evaluating on ImageNet-B, ImageNet-E, ImageNet-V2, ImageNet-A, ImageNet-R, ImageNet-S, run:
cd classification/
python inference.py --dataset <dataset name> --data_dir <path to corrupted dataset> --batch_size <> --source_model_name <model name>
--dataset
: imagenet-b, imagenet-e, imagenet-v2, imagenet-a, imagenet-r, imagenet-s
--source_model_name
: model name to use for inference
For common corruption experiment, instead of saving the corrupted datasets, the corrupted images can be generated during the evaluation by running:
cd classification/
python inference_on_imagenet_c.py --data_dir <path to imagenet validation dataset> --batch_size <> --corruption <>
Following --corruption
options are available:
- Noise :
gaussian_noise, shot_noise, impulse_noise
- Blur :
defocus_blur, glass_blur, motion_blur, zoom_blur
- Weather :
snow, frost, fog, brightness
- Digital :
contrast, elastic_transform, pixelate, jpeg_compression
- Extra:
speckle_noise, gaussian_blur, spatter, saturate
The script would evaluate all the models across all the severity levels(1-5) of the given corruption.
- COCO-O (Natural Distribution Shifts) (Link)
- COCO-DC (Object-to-Background Compositional Changes) (Link)
- COCO-C (Common Corruptions)
- ADE20K-C (Common Corruptions)
Download COCO val2017 from (here) and to generate the common corruptions (COCO-C), run:
python coco_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>
Download ADED20K from (here) and to generate the common corruptions on the validation set(ADE20K-C), run:
python ade_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>
@article{shadab2024towards,
title={Towards Evaluating the Robustness of Visual State Space Models},
author={Shadab Malik, Hashmat and Shamshad, Fahad and Naseer, Muzammal and Nandakumar, Karthik and Shahbaz Khan, Fahad and Khan, Salman},
journal={arXiv e-prints},
pages={arXiv--2406},
year={2024}
}
Should you have any question, please create an issue on this repository or contact at [email protected]
Our code is based on VMamba, MambaVision, IPViT, On the Adversarial Robustness of Visual Transformer, imagecorruptions, and timm libray. We thank them for open-sourcing their codebase.