Skip to content

HashmatShadab/MambaRobustness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

50 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Towards Evaluating the Robustness of Visual State Space Models

Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, and Salman Khan
MBZUAI, UAE.

paper

Official PyTorch implementation


πŸ”₯ News

  • (September 17, 2024)
    • Updated the report: Added results obtained on MambaVision family of models, along with model calibration results.
  • (June 14, 2024)
    • Code for robust evaluation of models is released.

Abstract: Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In this work, we present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios, including occlusions, image structure, common corruptions, and adversarial attacks, and compare their performance to well-established architectures such as transformers and Convolutional Neural Networks. Furthermore, we investigate the resilience of VSSMs to object-background compositional changes on sophisticated benchmarks designed to test model performance in complex visual scenes. We also assess their robustness on object detection and segmentation tasks using corrupted datasets that mimic real-world scenarios. To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations. Our findings highlight the strengths and limitations of VSSMs in handling complex visual corruptions, offering valuable insights for future research and improvements in this promising field.

Table of Contents

  1. Installation
  2. Available Models
  3. Robustness against Adversarial attacks
  4. Robustness against Information Drop
  5. Robustness against ImageNet corruptions
  6. Robustness evaluation for Object Detection
  7. BibTeX
  8. Contact
  9. References
conda create -n mamba_robust

conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r req.txt
cd kernels/selective_scan && pip install .
Model Tiny Small Base
VMamba (v0) vssm_tiny_v0 vssm_small_v0 vssm_base_v0
VMamba (v2) vssm_tiny_v2 vssm_small_v2 vssm_base_v2
Vision Transformer vit_tiny_patch16_224 vit_small_patch16_224 vit_base_patch16_224
Swin Transformer swin_tiny_patch4_window7_224 swin_small_patch4_window7_224 swin_base_patch4_window7_224
ConvNext convnext_tiny convnext_small convnext_base

ResNet: resnet18, resnet50

VGG: vgg16_bn, vgg19_bn

Download VMamba ImageNet pre-trained weights and put them in pretrained_weights folder.

Download pre-trained weights for object detectors (Link) and segmentation networks (Link).

For crafting adversarial examples using Fast Gradient Sign Method (FGSM) at perturbation budget of 8/255, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm  --source_model_name <model_name> --epsilon 8  

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20 

Other available attacks: bim, mifgsm, difgsm, tpgd, tifgsm, vmifgsm

The results will be saved in AdvExamples_results folder with the following structure: AdvExamples_results/pgd_eps_{eps}_steps_{step}/{source_model_name}/accuracy.txt

Low-Pass Frequency Attack

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve low 

High-Pass Frequency Attack

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve high

The results will be saved in AdvExamples_freq_results folder.

Run the below script to evaluate the robustness across different models against low and high frequency attacks at various perturbation budgets:

cd  classification/
bash scripts/get_adv_freq_results.sh <DATA_PATH> <ATTACK_NAME> <BATCH_SIZE>

For evaluating transferability of adversarial examples, first save the generated adversarial examples by running:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm  --source_model_name <model_name> --epsilon 8 --save_results_only False  

The adversarial examples will be saved in AdvExamples folder with the following structure: AdvExamples/{attack_name}_eps_{eps}_steps_{step}/{source_model_name}/images_labels.pt

Then run the below script to evaluate transferability of the generated adversarial examples across different models:

cd  classification/
python inference.py --dataset imagenet_adv --data_dir <path to adversarial dataset> --batch_size <> --source_model_name <model name>

--source_model_name: name of the model on which the adversarial examples will be evaluated

Furthermore, bash scripts are provided to evaluate transferability of adversarial examples across different models:

cd  classification/
# Generate adversarial examples
bash scripts/gen_adv_examples.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>
# Evaluate transferability of adversarial examples saved in AdvExamples folder
bash scripts/evaluate_transferability.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>

Run the below script to evaluate the robustness of all the models against information drop along scanning lines:

cd  classification/
bash scripts/scan_line_info_drop.sh <DATA_PATH> <EXP_NUM> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into. <EXP_NUM>:

  • 1: linearly increasing the amount of information dropped in each patch along the scanning direction.
  • 2: Increasing the amount of information dropped in each patch with maximum at center of the scanning direction.
  • 3: Decreasing the amount of information dropped in each patch with minimum at center of the scanning direction.
  • 4: Sequentially dropping patches along the scanning directions.

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/random_patch_drop.sh <DATA_PATH> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into.

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/salient_drop.sh <DATA_PATH> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into.

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/shuffle_image.sh <DATA_PATH> 

<DATA_PATH>: path to the dataset

Following Corrupted Datasets for Classifcation are used for evaluation:

  1. ImageNet-B (Object-to-Background Compositional Changes) (Link)
  2. ImageNet-E (Attribute Editing) (Link)
  3. ImageNet-V2 (Link)
  4. ImageNet-A (Natural Adversarial Examples) (Link)
  5. ImageNet-R (Rendition) (Link)
  6. ImageNet-S (Sketch) (Link)
  7. ImageNet-C (Common Corruptions) (Link)

Inference on ImageNet Corrupted datasets

For evaluating on ImageNet-B, ImageNet-E, ImageNet-V2, ImageNet-A, ImageNet-R, ImageNet-S, run:

cd  classification/
python inference.py --dataset <dataset name> --data_dir <path to corrupted dataset> --batch_size <> --source_model_name <model name>

--dataset: imagenet-b, imagenet-e, imagenet-v2, imagenet-a, imagenet-r, imagenet-s

--source_model_name: model name to use for inference

For common corruption experiment, instead of saving the corrupted datasets, the corrupted images can be generated during the evaluation by running:

cd  classification/
python inference_on_imagenet_c.py --data_dir <path to imagenet validation dataset> --batch_size <> --corruption <>

Following --corruption options are available:

  1. Noise : gaussian_noise, shot_noise, impulse_noise
  2. Blur : defocus_blur, glass_blur, motion_blur, zoom_blur
  3. Weather : snow, frost, fog, brightness
  4. Digital : contrast, elastic_transform, pixelate, jpeg_compression
  5. Extra: speckle_noise, gaussian_blur, spatter, saturate

The script would evaluate all the models across all the severity levels(1-5) of the given corruption.

Following Corrupted Datasets for Detection and Segmentation are used for evaluation:

  1. COCO-O (Natural Distribution Shifts) (Link)
  2. COCO-DC (Object-to-Background Compositional Changes) (Link)
  3. COCO-C (Common Corruptions)
  4. ADE20K-C (Common Corruptions)

Download COCO val2017 from (here) and to generate the common corruptions (COCO-C), run:

python coco_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>

Download ADED20K from (here) and to generate the common corruptions on the validation set(ADE20K-C), run:

python ade_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>
@article{shadab2024towards,
  title={Towards Evaluating the Robustness of Visual State Space Models},
  author={Shadab Malik, Hashmat and Shamshad, Fahad and Naseer, Muzammal and Nandakumar, Karthik and Shahbaz Khan, Fahad and Khan, Salman},
  journal={arXiv e-prints},
  pages={arXiv--2406},
  year={2024}
}

Should you have any question, please create an issue on this repository or contact at [email protected]


Our code is based on VMamba, MambaVision, IPViT, On the Adversarial Robustness of Visual Transformer, imagecorruptions, and timm libray. We thank them for open-sourcing their codebase.

About

Towards Evaluating the Robustness of Visual State Space Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published