ﻻ يوجد ملخص باللغة العربية
We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing. If the ground truth signal $mathbf{X}^* in mathbb{R}^{d*d}$ is of rank $r$, but we try to recover it using $mathbf{F} mathbf{F}^top$ where $mathbf{F} in mathbb{R}^{d*k}$ and $k>r$, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix $mathbf{F}$ into separate column spaces to capture the effect of extra ranks, we show that $|mathbf{F}_t mathbf{F}_t - mathbf{X}^*|_{F}^2$ converges to a statistical error of $tilde{mathcal{O}} ({k d sigma^2/n})$ after $tilde{mathcal{O}}(frac{sigma_{r}}{sigma}sqrt{frac{n}{d}})$ number of iterations where $mathbf{F}_t$ is the output of FGD after $t$ iterations, $sigma^2$ is the variance of the observation noise, $sigma_{r}$ is the $r$-th largest eigenvalue of $mathbf{X}^*$, and $n$ is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.
We consider whether algorithmic choices in over-parameterized linear matrix factorization introduce implicit regularization. We focus on noiseless matrix sensing over rank-$r$ positive semi-definite (PSD) matrices in $mathbb{R}^{n times n}$, with a s
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, whic
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditio
Nesterov SGD is widely used for training modern neural networks and other machine learning models. Yet, its advantages over SGD have not been theoretically clarified. Indeed, as we show in our paper, both theoretically and empirically, Nesterov SGD w
The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-paramete