Skip to content

Latest commit

 

History

History
145 lines (118 loc) · 6.57 KB

README.md

File metadata and controls

145 lines (118 loc) · 6.57 KB

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Python 3.8 Pytorch 2.3.0 Tensorflow 2.13.1 CUDA 12.1 License MIT

Table of Contents

Overview

Code Architecture

.
├── backdoors             # Backdoor functions
├── configs               # Configurations
├── data                  # Data folder
│   └── trigger           # Trigger images (style images)
├──magenta_arbitrary-image-stylization-v1-256_2  # Model path of the style model
├── models                # Model architectures
│   ├── resnet.py         # ResNet models
│   ├── vgg.py            # VGG models
│   └── genr.py           # Trigger injection function
├── attack.py             # Data poisoning function
├── detoxification.py     # Detoxification function
├── main.py               # Main function
├── train.py              # Training function
└── utils.py              # Utility functions

Environments

# Create python environment (optional)
conda env create -f environment.yml
source activate dfst

Requirement

Please download the pre-trained model from the following link: Download Pre-trained Model

After downloading, unzip the file in the same directory. This will create a folder named ./magenta_arbitrary-image-stylization-v1-256_2 containing the pre-trained style transfer model.

Experiments

We provide example code snippets for CIFAR-10 and ResNet-18. These can be easily plugged in and modified in ./utils.py, specifically within the get_dataset(*) and get_model(*) functions.

Usage

  • Train a clean model.
python main.py --gpu 0 --attack clean
  • Train a model attacked by BadNets
python main.py --gpu 1 --attack badnet
  • Train a model attacked by DFST
python main.py --gpu 2 --attack dfst

Configurations

The specific DFST poisoning configurations can be found in ./configs/dfst.json. The configurations for clean (./configs/clean.json) and BadNets (./configs/badnet.json) are similar.

Hyperparameter Default Value Description
dataset "cifar10" The utilized dataset.
network "resnet18" The utilized model architecture.
seed 1024 Random seed for reproducibility.
batch_size 128 Batch size for training.
epochs 200 Total number of training epochs.
attack "dfst" Type of backdoor attack employed.
target 0 The attack target label.
poison_rate 0.05 Poisoning rate within a training batch.
style_model_path "magenta_arbitrary-image-stylization-v1-256_2" Path to the pre-trained style model.
alpha 0.6 Transparency parameter for poisoned images.
detox_flag true Indicates whether the detoxification process is applied.
detox_layers ["layer4"] The selected layers for detoxfication.
detox_neuron_ratio 0.01 Ratio of compromised neurons identified in each layer.
detox_epochs 10 Number of epochs for training the feature injector.
w_ssim 0.1 Weight of the SSIM loss during feature injector training.
w_detox 0.3 Weight of the detoxing loss during feature injector training.

Outputs

A saved model path will be appended to the folder upon training completion. For example, after training a DFST-attacked model, the folder ./model_dir_dfst will be created. This folder will contain various data and logs.

  • ./model_dir_dfst/model.pt: Model file.
  • ./model_dir_dfst/poison_data.pt: Contains both training and testing poisoned data.
  • ./model_dir_dfst/training.log: Logs of the training process.
  • ./model_dir_dfst/visual_poison.png: Visualization of a few poisoned images.

Revisions to the Original

  • Replaced CycleGAN training with a pre-trained Arbitrary-Image-Stylization model for efficiency (./backdoors/dfst.py).
  • Dynamically adjusted intensity of style transfer using the transparency parameter (alpha).
  • Switched to a simplified Feature Injection function (./models/genr.py) over UNet to accelerate convergence.
  • Identified the top-ranked (1%) compromised neurons (./detoxification.py), moving away from the use of multiple thresholds.
  • Implemented the Detoxification process during each training epoch (./train.py).

Citation

Please cite our paper if you find it useful for your research.😀

@inproceedings{cheng2021deep,
  title={Deep feature space trojan attack of neural networks by controlled detoxification},
  author={Cheng, Siyuan and Liu, Yingqi and Ma, Shiqing and Zhang, Xiangyu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={2},
  pages={1148--1156},
  year={2021}
}

Acknowledgement