No Arabic abstract
Recent focus on robustness to adversarial attacks for deep neural networks produced a large variety of algorithms for training robust models. Most of the effective algorithms involve solving the min-max optimization problem for training robust models (min step) under worst-case attacks (max step). However, they often suffer from high computational cost from running several inner maximization iterations (to find an optimal attack) inside every outer minimization iteration. Therefore, it becomes difficult to readily apply such algorithms for moderate to large size real world data sets. To alleviate this, we explore the effectiveness of iterative descent-ascent algorithms where the maximization and minimization steps are executed in an alternate fashion to simultaneously obtain the worst-case attack and the corresponding robust model. Specifically, we propose a novel discrete-time dynamical system-based algorithm that aims to find the saddle point of a min-max optimization problem in the presence of uncertainties. Under the assumptions that the cost function is convex and uncertainties enter concavely in the robust learning problem, we analytically show that our algorithm converges asymptotically to the robust optimal solution under a general adversarial budget constraints as induced by $ell_p$ norm, for $1leq pleq infty$. Based on our proposed analysis, we devise a fast robust training algorithm for deep neural networks. Although such training involves highly non-convex robust optimization problems, empirical results show that the algorithm can achieve significant robustness compared to other state-of-the-art robust models on benchmark data sets.
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior. In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used. We develop ByzantinePGD, a robust first-order algorithm that can provably escape saddle points and fake local minima, and converge to an approximate true local minimizer with low iteration complexity. As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting. We further discuss three robust gradient estimators that can be used in ByzantinePGD, including median, trimmed mean, and iterative filtering. We characterize their performance in concrete statistical settings, and argue for their near-optimality in low and high dimensional regimes.
We consider a variant of regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments may not be able to maintain the correspondence between the samples and the measurements. Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -- ROBOT -- for the shuffled regression problem, which is applicable to large data and complex nonlinear models. Specifically, we reformulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT can be further extended to the inexact correspondence setting, where there may not be an exact alignment between the input and output data. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ($ell_infty$ and $ell_2$). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stopping. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting. Finally, we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stopping. All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/locuslab/robust_overfitting.
The implementation of smart building technology in the form of smart infrastructure applications has great potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. However, human preference in regard to living conditions is usually unknown and heterogeneous in its manifestation as control inputs to a building. Furthermore, the occupants of a building typically lack the independent motivation necessary to contribute to and play a key role in the control of smart building infrastructure. Moreover, true human actions and their integration with sensing/actuation platforms remains unknown to the decision maker tasked with improving operational efficiency. By modeling user interaction as a sequential discrete game between non-cooperative players, we introduce a gamification approach for supporting user engagement and integration in a human-centric cyber-physical system. We propose the design and implementation of a large-scale network game with the goal of improving the energy efficiency of a building through the utilization of cutting-edge Internet of Things (IoT) sensors and cyber-physical systems sensing/actuation platforms. A benchmark utility learning framework that employs robust estimations for classical discrete choice models provided for the derived high dimensional imbalanced data. To improve forecasting performance, we extend the benchmark utility learning scheme by leveraging Deep Learning end-to-end training with Deep bi-directional Recurrent Neural Networks. We apply the proposed methods to high dimensional data from a social game experiment designed to encourage energy efficient behavior among smart building occupants in Nanyang Technological University (NTU) residential housing. Using occupant-retrieved actions for resources such as lighting and A/C, we simulate the game defined by the estimated utility functions.