Robustness and Generalization to Nearest Categories


Abstract in English

Adversarial robustness has emerged as a desirable property for neural networks. Prior work shows that robust networks perform well in some out-of-distribution generalization tasks, such as transfer learning and outlier detection. We uncover a different kind of out-of-distribution generalization property of such networks, and find that they also do well in a task that we call nearest category generalization (NCG) - given an out-of-distribution input, they tend to predict the same label as that of the closest training example. We empirically show that this happens even when the out-of-distribution inputs lie outside the robustness radius of the training data, which suggests that these networks may generalize better along unseen directions on the natural image manifold than arbitrary unseen directions. We examine how performance changes when we change the robustness regions during training. We then design experiments to investigate the connection between out-of-distribution detection and nearest category generalization. Taken together, our work provides evidence that robust neural networks may resemble nearest neighbor classifiers in their behavior on out-of-distribution data. The code is available at https://github.com/yangarbiter/nearest-category-generalization

Download