Aleksey Nozdryn-Plotnicki

Introduction

This is a summary of Adversarial Examples Improve Image Recognition (ICML 2019) for the Vancouver Data Science Reading Group

The paper authors are: Cihang Xie (Google, Johns Hopkins University), Mingxing Tan (Google), Boqing Gong (Google), Jiang Wang (Google), Alan Yuille (Johns Hopkins University), Quoc V. Le (Google)

There are pertrained models from the authors: TF. Our own Ross Wigthman has pytorch pretrained models in timm.

Main Takeaway

The authors set a new state of the art result on the ImageNet classification benchmark. They do so with a very large model, but they also make improvements across the EfficientNet family of models, across various sizes and speeds. They do so by performing adversarial training with separate batch norm layers for clean and attacked images.

Background

Adversarial examples are otherwise normal images that have been manipulated by an adversary in order to fool a neural network. Typically structured noise is added to the image such that a human would report the image unchanged and therefore the correct label unchanged, but a neural network would be 100% wrong.

Adversarial training is a method where a nueral network is trained on adversarial examples rather than clean examples. This will train models that correctly classify adversarial examples much more often, but they typically have worse performance on clean images when compared to models trained in the normal way

Why should we expect worse results from adversarial training? An extremely hand-wavy argument goes that under adversarial training we are asking the network to do more with its finite capacity.

Why should we expect better results from adversarial training? When a model incorrectly classifies an adversarial image it is failing to generalize. A model that correctly classifies adversarial images agrees with humans more and is therefore a better model. Adversarial images are certainly exploiting overfit features in models, and training on them should be a valuable regularizer. Finally adversarial images can be thought of as a form of data augmentation. We use our prior knowledge of the image domain to state that a horizontal flip, a small change of brightness, or at touch of gaussian noise should not change the label of an ImageNet image. Therefore we perform these operations randomly in our data augmentation and our model learns those priors and is less prone to overfit. Similarly, robustness to structured adversarial noise is a prior that we can leverage in the same way.

Super relevant papers to this:

Adversarial Examples Are Not Bugs, They Are Features (NeruIPS 2019)
How Does Batch Normalization Help Optimization? (NIPS 2018)

Results

Across the board Top-1 accuracy improvements training EfficientNet models on the ImageNet benchmark.

The paper is structured in such a way that you might get the impression that the authors are the first to show improvements on clean images with adversarial training for ImageNet, but this is not the case. The comparison adversarial training method shows improvements for all of B5, B6, and B7. I wonder if this was deliberate. Furthermore, the gains from the authors method decrease with model size, such that the benefit is only +0.1% for EfficientNet B7, which might not even be statistically significant. Finally they report their amazing SOTA score for EfficientNet B8, but never perform a comparison. Given the trends on display for smaller models, a reasonable extrapolation would be that they will be equal. Also, their method is a minute improvement over adversarial training plus clean image fine tuning, but that is hard to discover due to the incompleteness of the exploration and where in the paper the information is. I am tempted to conclude that the authors include at least one very canny editor.

Following those harsh words, the impressive result should be repeated: Across the board Top-1 accuracy improvements training EfficientNet models on the ImageNet benchmark.

Why does it work?

The authors have one key hypothesis: A network trained on both clean and adversarial images sees images from two different distributions that will have different batch-norm statistics. By providing two batch norm layers, the model is now capable of correctly computing the means and variances for the two distributions, rather than incorrectly computing a mean and variance for their union.

Big Questions

What about Adversarial Examples Are Not Bugs, They Are Features? They convincingly demonstrate that the "structured noise" in adversarial examples that humans dismiss as meaningless are in fact valid, generalized features. This aligns with a line of thinking that CNNs see differently from humans, and tend to focus on textures more than shapes. After hearing that insight and thinking about the nature of convolutions, it seems obvious that they find textures easy to learn. Under this logic we might expect an adversarially trained model to perform poorly, because we have corrupted the texture information that it finds easy to learn, and is valid information for prediction. So why now are we seeing gains?

How does this result inform the "How does Batch Norm help?" discussion? The best analysis I have seen suggests that BN smooths the loss landscape for optimization. Nowhere in that do I see that data from different distributions would be problematic.

Why do three BNs for clean, AutoAugment, and adversarial images help EfficientNet B0, but no experiments are reported on the rest of the family?

Future work

If their disentanglement hypothesis is correct, then this multiple-BN approach should be useful generally when training on data from a mix of distributions. An example would be in medical imaging where one might have training images from a variety of devices and of course device is known at inference time.

They find worse results on the ResNet family of models both for traditional adversarial training and their own method. What about the EfficientNet architecture changes things?