No Arabic abstract
Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved SOTA provable robustness against $ell_2$ perturbations. A number of publications have extended the guarantees to other metrics, such as $ell_1$ or $ell_infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certified safety region using $0^{th}$-order and $1^{st}$-order information for Gaussian-smoothed classifiers. We also provide a framework that generalizes the calculation for certification using higher-order information. 3) We design efficient, high-confidence estimators for the relevant statistics of the first-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are significantly larger than the ones provided by the current methods. On CIFAR10 and Imagenet datasets, the new regions certified by our approach achieve significant improvements on general $ell_1$ certified radii and on the $ell_2$ certified radii for color-space attacks ($ell_2$ restricted to 1 channel) while also achieving smaller improvements on the general $ell_2$ certified radii. Our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitude of certified radii without requiring the use of data-dependent smoothing techniques.
Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e.g. translations, rotations, etc. Current input deformation certification methods either (i) do not scale to deep networks on large input datasets, or (ii) can only certify a specific class of deformations, e.g. only rotations. We reformulate certification in randomized smoothing setting for both general vector field and parameterized deformations and propose DeformRS-VF and DeformRS-Par, respectively. Our new formulation scales to large networks on large input datasets. For instance, DeformRS-Par certifies rich deformations, covering translations, rotations, scaling, affine deformations, and other visually aligned deformations such as ones parameterized by Discrete-Cosine-Transform basis. Extensive experiments on MNIST, CIFAR10 and ImageNet show that DeformRS-Par outperforms existing state-of-the-art in certified accuracy, e.g. improved certified accuracy of 6% against perturbed rotations in the set [-10,10] degrees on ImageNet.
Randomized smoothing is a recent technique that achieves state-of-art performance in training certifiably robust deep neural networks. While the smoothing family of distributions is often connected to the choice of the norm used for certification, the parameters of these distributions are always set as global hyper parameters independent of the input data on which a network is certified. In this work, we revisit Gaussian randomized smoothing and show that the variance of the Gaussian distribution can be optimized at each input so as to maximize the certification radius for the construction of the smoothed classifier. This new approach is generic, parameter-free, and easy to implement. In fact, we show that our data dependent framework can be seamlessly incorporated into 3 randomized smoothing approaches, leading to consistent improved certified accuracy. When this framework is used in the training routine of these approaches followed by a data dependent certification, we achieve 9% and 6% improvement over the certified accuracy of the strongest baseline for a radius of 0.5 on CIFAR10 and ImageNet.
We present a scalable technique for upper bounding the Lipschitz constant of generative models. We relate this quantity to the maximal norm over the set of attainable vector-Jacobian products of a given generative model. We approximate this set by layerwise convex approximations using zonotopes. Our approach generalizes and improves upon prior work using zonotope transformers and we extend to Lipschitz estimation of neural networks with large output dimension. This provides efficient and tight bounds on small networks and can scale to generative models on VAE and DCGAN architectures.
In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension namely, HAdam, which uses higher order moments of the stochastic gradient. Our analysis and experiments reveal that certain higher-order moments of the stochastic gradient are able to achieve better performance compared to the vanilla Adam algorithm. We also provide some analysis of HAdam related to odd and even moments to explain some intriguing and seemingly non-intuitive empirical results.
Graph neural network models have been extensively used to learn node representations for graph structured data in an end-to-end setting. These models often rely on localized first order approximations of spectral graph convolutions and hence are unable to capture higher-order relational information between nodes. Probabilistic Graphical Models form another class of models that provide rich flexibility in incorporating such relational information but are limited by inefficient approximate inference algorithms at higher order. In this paper, we propose to combine these approaches to learn better node and graph representations. First, we derive an efficient approximate sum-product loopy belief propagation inference algorithm for higher-order PGMs. We then embed the message passing updates into a neural network to provide the inductive bias of the inference algorithm in end-to-end learning. This gives us a model that is flexible enough to accommodate domain knowledge while maintaining the computational advantage. We further propose methods for constructing higher-order factors that are conditioned on node and edge features and share parameters wherever necessary. Our experimental evaluation shows that our model indeed captures higher-order information, substantially outperforming state-of-the-art $k$-order graph neural networks in molecular datasets.