No Arabic abstract
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Ojas iteration which is an online stochastic gradient descent method for the principal component analysis. Ojas iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Ojas iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Ojas iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.
Solving statistical learning problems often involves nonconvex optimization. Despite the empirical success of nonconvex statistical optimization methods, their global dynamics, especially convergence to the desirable local minima, remain less well understood in theory. In this paper, we propose a new analytic paradigm based on diffusion processes to characterize the global dynamics of nonconvex statistical optimization. As a concrete example, we study stochastic gradient descent (SGD) for the tensor decomposition formulation of independent component analysis. In particular, we cast different phases of SGD into diffusion processes, i.e., solutions to stochastic differential equations. Initialized from an unstable equilibrium, the global dynamics of SGD transit over three consecutive phases: (i) an unstable Ornstein-Uhlenbeck process slowly departing from the initialization, (ii) the solution to an ordinary differential equation, which quickly evolves towards the desirable local minimum, and (iii) a stable Ornstein-Uhlenbeck process oscillating around the desirable local minimum. Our proof techniques are based upon Stroock and Varadhans weak convergence of Markov chains to diffusion processes, which are of independent interest.
The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of covariance matrices, or more generally of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees for Denise are provided. These include a novel universal approximation theorem adapted to our geometric deep learning problem, convergence to an optimal solution of the learning problem and convergence of the training scheme. Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately 2000x faster than the state-of-the-art, PCP, and 200x faster than the current speed optimized method, fast PCP.
Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank component, which allows us to obtain an accurate low rank approximation via only three small submatrices. Consequently, IRCUR is able to process only the small submatrices and avoid expensive computing on the full matrix through the entire algorithm. Numerical experiments establish the computational advantage of IRCUR over the state-of-art algorithms on both synthetic and real-world datasets.
We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a smooth projection onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.
Shot noise processes have been extensively studied due to their mathematical properties and their relevance in several applications. Here, we consider nonnegative shot noise processes and prove their weak convergence to Levy-driven Ornstein-Uhlenbeck (OU), whose features depend on the underlying jump distributions. Among others, we obtain the OU-Gamma and OU-Inverse Gaussian processes, having gamma and inverse gaussian processes as background Levy processes, respectively. Then, we derive the necessary conditions guaranteeing the diffusion limit to a Gaussian OU process, show that they are not met unless allowing for negative jumps happening with probability going to zero, and quantify the error occurred when replacing the shot noise with the OU process and the non-Gaussian OU processes. The results offer a new class of models to be used instead of the commonly applied Gaussian OU processes to approximate synaptic input currents, membrane voltages or conductances modelled by shot noise in single neuron modelling.