Related materials for robust and explainable machine learning
- Intriguing properties of neural networks
Individual unit contains no semantic information; Adversarial examples by L-BFGS (Optimization based). - Deep Neural Networks are Easily Fooled:
High Confidence Predictions for Unrecognizable Images
Fool images by evolution algorithm. - Universal adversarial perturbations
Universal adversarial perturbations can fool the network in most of the images.
- Delving into Transferable Adversarial Examples and Black-box Attacks
Examine the transferability on ImageNet dataset and use this property to attack black-box systems.
- Explaining and Harnessing Adversarial Examples
Fast gradient sign method. - Adversarial Examples In The Physical World
Printed photos can also fool the networks; Introduce an iterative method (extension of FGS). - The Limitations of Deep Learning in Adversarial Settings
Find salient input regions that are useful for adversarial examples. - Towards Evaluating the Robustness of Neural Networks
Optimization based approach. - DeepFool: a simple and accurate method to fool deep neural networks
A new method to generate non-targeted adversarial examples. Find the closest boundary and also use the gradient. - Good Word Attacks on Statistical Spam Filters
- Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Block-box attack using a substitute network. - Simple Black-Box Adversarial Perturbations for Deep Networks
Black-box attack using greedy search. - Adversarial Manipulation of Deep Representations
Find an adversarial image that has similar representations with a target image (trivial). - Adversarial Diversity and Hard Positive Generation
- Adversarial examples for generative models
Attack VAE and VAE-GAN. - Adversarial Images for Variational Autoencoders
Attack VAE by latent representations.
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Train a second network with soft target labels. - Robust Convolutional Neural Networks under Adversarial Noise
Improve robustness by injecting noise during training. - Towards Deep Neural Network Architectures Robust to Adversarial Examples
Use aotoencoder to denoise. - On Detecting Adversarial Perturbations
Detect adversarial perturbations in intermediate layers by a detector network and dynamic generate adversarial images during training. They also propose fast gradient method, which is an extension of iterative method based on l2 norm.
- Measuring Neural Net Robustness with Constraints
A measurement of robustness. - A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples
- Blind Attacks on Machine Learners
- SoK Towards the Science of Security and Privacy in Machine Learning
- Robustness of classifiers: from adversarial to random noise
- Towards A Rigorous Science of Interpretable Machine Learning
An overview of interpretability. - Visualizing and Understanding Convolutional Networks
Deconvolution. - Inverting Visual Representations with Convolutional Networks
Code inversion by learning a decoder network. - Understanding Deep Image Representations by Inverting Them
Code inversion with priors. - Synthesizing the preferred inputs for neurons in neural networks via deep generator networks
Synthesize an image from internal representations and use GAN (deconvolution) to learn image priors. (like code inversion) - Visualizing Higher-Layer Features of a Deep Network
Activation maximization. - Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks
Activation maximization for multifaceted features. - Towards Better Analysis of Deep Convolutional Neural Networks
An useful tool. Represent a neuron by top image patches with highest activation. - Object Detectors Emerge in Deep Scene CNNs
Visualize neurons by highest activated images and corresponding receptive fields. - Visualizing Deep Neural Network Decisions: Prediction Difference Analysis
A general method to visualize image regions that support or against a prediction (Attention). It can also be used to visualize neurons. - STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET
Guided backpropogation - Network Dissection Quantifying Interpretability of Deep Visual Representations
A new dataset with pixel-level annotations to quantify the interpretability of neurons (by using IoU). - Do semantic parts emerge in Convolutional Neural Networks?
Semantic parts emerge in CNNs by using detection datasets. - Learning Deep Features for Discriminative Localization
CAM for weakly supervised detection. - Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization
Extension of CAM on captioning and VQA. - Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Visualize the class specific representation in the input space(activation maximization) and use the gradient information to find the saliency map. Gradients can represent the importance. - Towards Transparent AI Systems: Interpreting Visual Question Answering Models
Interpreting VQA answers by finding important image regions and question words. - Human Attention in Visual Question Answering:
Do Humans and Deep Networks Look at the Same Regions?
Study the attention regions made by humans and attention-models in VQA task.
- Generating Visual Explanations
Generate an explanation for bird classification. - Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Justify its decisions by generating a neuron sentence and pointing to important image regions (Attention) in VQA task.
- Inducing Interpretable Representations with Variational Autoencoders
Learn interpretable latent variables in VAE. - InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
In GAN.