Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

84 0 0.0 ( 0 )

Download Cite

Added by Yi Zhang

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Yi Zhang - Orestis Plevrakis - Simon S. Du

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory for standard (non-adversarial) supervised training was developed by various groups for {em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective. Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be emph{exponential} in input dimension $d$, and with an unnatural activation function. Our work proves convergence to low robust training loss for emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation. Key element of our proof is showing that ReLU networks near initialization can approximate the step function, which may be of independent interest.

rate research

Overcoming the Curse of Dimensionality in Density Estimation with Mixed Sobolev GANs

117 - Liang Ding , Rui Tuo , Shahin Shahrampour 2020

We propose a novel GAN framework for non-parametric density estimation with high-dimensional data. This framework is based on a novel density estimator, called the hyperbolic cross density estimator, which enjoys nice convergence properties in the mixed Sobolev spaces. As modifications of the usual Sobolev spaces, the mixed Sobolev spaces are more suitable for describing high-dimensional density functions. We prove that, unlike other existing approaches, the proposed GAN framework does not suffer the curse of dimensionality and can achieve the optimal convergence rate of $O_p(n^{-1/2})$, with $n$ data points in an arbitrary fixed dimension. We also study the universality of GANs in terms of the existence of ReLU networks which can approximate the density functions in the mixed Sobolev spaces up to any accuracy level.

Machine Learning Machine Learning

Orthogonal Over-Parameterized Training

107 - Weiyang Liu , Rongmei Lin , Zhen Liu 2020

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of neurons. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT over the standard training.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

Training Over-parameterized Models with Non-decomposable Objectives

63 - Harikrishna Narasimhan , Aditya Krishna Menon 2021

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.

Machine Learning Artificial Intelligence

Memory-efficient training with streaming dimensionality reduction

85 - Siyuan Huang , Brian D. Hoskins , Matthew W. Daniels 2020

The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads. To minimize this overhead, especially on the movement and calculation of gradient information, we introduce streaming batch principal component analysis as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic k-rank approximation of the network gradient. We demonstrate that the low rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini batch gradient descent. These results can lead to both improvements in the design of application specific integrated circuits for deep learning and in the speed of synchronization of machine learning models trained with data parallelism.

Machine Learning Machine Learning

On the computational and statistical complexity of over-parameterized matrix sensing

192 - Jiacheng Zhuo , Jeongyeol Kwon , Nhat Ho 2021

We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing. If the ground truth signal $mathbf{X}^* in mathbb{R}^{d*d}$ is of rank $r$, but we try to recover it using $mathbf{F} mathbf{F}^top$ where $mathbf{F} in mathbb{R}^{d*k}$ and $k>r$, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix $mathbf{F}$ into separate column spaces to capture the effect of extra ranks, we show that $|mathbf{F}_t mathbf{F}_t - mathbf{X}^*|_{F}^2$ converges to a statistical error of $tilde{mathcal{O}} ({k d sigma^2/n})$ after $tilde{mathcal{O}}(frac{sigma_{r}}{sigma}sqrt{frac{n}{d}})$ number of iterations where $mathbf{F}_t$ is the output of FGD after $t$ iterations, $sigma^2$ is the variance of the observation noise, $sigma_{r}$ is the $r$-th largest eigenvalue of $mathbf{X}^*$, and $n$ is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions