Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks

Code for the paper "Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks" by Alexander Levine and Soheil Feizi. Provides a smoothing-based defense against the Wasserstein Adversarial attack proposed by Wong et al. (2019) with a robustness certificate which is nonvacuous for the Wasserstein metric compared to smoothing-based L1 certified defenses.

Usage Examples

To train a model with smoothing standard deviation of 0.01: python3 wass_smooth_training_mnist.py --stdev 0.01
To compute the accuracy of a trained smoothed model, run python3 mnist_wass_predict.py --stdev 0.01 --model mnist_smooth_base_lr_0.001_stddev_0.01_epoch_199.pth. This will save accuracy information in a .pth file in the accuaracies directory.
To compute robustness certificates for a trained smoothed model, run python3 wass_mnist_certify.py --stdev 0.01 --model mnist_smooth_base_lr_0.001_stddev_0.01_epoch_199.pth. This will save the certified robustness of each image in the test set in a .pth file in the radii directory.
To attack a smoothed classifier, run python3 attack_mnist_smoothed.py --stdev 0.01 --checkpoint mnist_smooth_base_lr_0.001_stddev_0.01_epoch_199.pth. This will save the empirical attack radius of each image in the test set in a .pth file in the epsilons directory.
Files with 'laplace' in their names use Laplace smoothing instead of the proposed Wasserstein smoothing, but still compute certificates relative to the Wasserstein matric.
Files with 'cifar' in their names use CIFAR-10 rather than MNIST.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
checkpoints		checkpoints
projected_sinkhorn		projected_sinkhorn
pytorch-cifar @ 3407511		pytorch-cifar @ 3407511
.gitmodules		.gitmodules
LICENSE.TXT		LICENSE.TXT
README.md		README.md
attack_cifar_baseline.py		attack_cifar_baseline.py
attack_cifar_smooth.py		attack_cifar_smooth.py
attack_mnist_baseline.py		attack_mnist_baseline.py
attack_mnist_smoothed.py		attack_mnist_smoothed.py
cifar_indices.pth		cifar_indices.pth
cifar_wass_predict.py		cifar_wass_predict.py
laplace_mnist_certify.py		laplace_mnist_certify.py
laplace_smooth_training_mnist.py		laplace_smooth_training_mnist.py
mnist_laplace_predict.py		mnist_laplace_predict.py
mnist_wass_predict.py		mnist_wass_predict.py
pgd.py		pgd.py
smooth_pgd.py		smooth_pgd.py
utils.py		utils.py
wass_cifar_certify.py		wass_cifar_certify.py
wass_mnist_certify.py		wass_mnist_certify.py
wass_smooth_training_cifar.py		wass_smooth_training_cifar.py
wass_smooth_training_mnist.py		wass_smooth_training_mnist.py
wass_smooth_utils.py		wass_smooth_utils.py