No Arabic abstract
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the fully-connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires $O(Nn)$ work to compute an entry (and the entire column) in the GNH matrix for a neural network with $N$ parameters and $n$ data points, our fast sampling algorithm reduces the cost to $O(n+d/epsilon^2)$ work, where $d$ is the output dimension of the network and $epsilon$ is a prescribed accuracy (independent of $N$). One application of our algorithm is constructing the hierarchical-matrix (H-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires $O(N^2)$ memory and $O(N^3)$ work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only $O(N r_o)$ memory footprint and $O(N r_o^2)$ work to be factorized, where $r_o ll N$ is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H-matrix approximation on classification and autoencoder neural networks.
This paper is concerned with the introduction of Tikhonov regularization into least squares approximation scheme on $[-1,1]$ by orthonormal polynomials, in order to handle noisy data. This scheme includes interpolation and hyperinterpolation as special cases. With Gauss quadrature points employed as nodes, coefficients of the approximation polynomial with respect to given basis are derived in an entry-wise closed form. Under interpolatory conditions, the solution to the regularized approximation problem is rewritten in forms of two kinds of barycentric interpolation formulae, by introducing only a multiplicative correction factor into both classical barycentric formulae. An $L_2$ error bound and a uniform error bound are derived, providing similar information that Tikhonov regularization is able to reduce the operator norm (Lebesgue constant) and the error term related to the level of noise, both by multiplying a correction factor which is less than one. Numerical examples show the benefits of Tikhonov regularization when data is noisy or data size is relatively small.
We analyze the Lanczos method for matrix function approximation (Lanczos-FA), an iterative algorithm for computing $f(mathbf{A}) mathbf{b}$ when $mathbf{A}$ is a Hermitian matrix and $mathbf{b}$ is a given mathbftor. Assuming that $f : mathbb{C} rightarrow mathbb{C}$ is piecewise analytic, we give a framework, based on the Cauchy integral formula, which can be used to derive {em a priori} and emph{a posteriori} error bounds for Lanczos-FA in terms of the error of Lanczos used to solve linear systems. Unlike many error bounds for Lanczos-FA, these bounds account for fine-grained properties of the spectrum of $mathbf{A}$, such as clustered or isolated eigenvalues. Our results are derived assuming exact arithmetic, but we show that they are easily extended to finite precision computations using existing theory about the Lanczos algorithm in finite precision. We also provide generalized bounds for the Lanczos method used to approximate quadratic forms $mathbf{b}^textsf{H} f(mathbf{A}) mathbf{b}$, and demonstrate the effectiveness of our bounds with numerical experiments.
We consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge--Kutta methods play an essential role.
In this article, we present an $O(N log N)$ rapidly convergent algorithm for the numerical approximation of the convolution integral with radially symmetric weakly singular kernels and compactly supported densities. To achieve the reduced computational complexity, we utilize the Fast Fourier Transform (FFT) on a uniform grid of size $N$ for approximating the convolution. To facilitate this and maintain the accuracy, we primarily rely on a periodic Fourier extension of the density with a suitably large period depending on the support of the density. The rate of convergence of the method increases with increasing smoothness of the periodic extension and, in fact, approximations exhibit super-algebraic convergence when the extension is infinitely differentiable. Furthermore, when the density has jump discontinuities, we utilize a certain Fourier smoothing technique to accelerate the convergence to achieve the quadratic rate in the overall approximation. Finally, we apply the integration scheme for numerical solution of certain partial differential equations. Moreover, we apply the quadrature to obtain a fast and high-order Nystom solver for the solution of the Lippmann-Schwinger integral equation. We validate the performance of the proposed scheme in terms of accuracy as well as computational efficiency through a variety of numerical experiments.
Let $A$ be a square complex matrix; $z_1$, ..., $z_{N}inmathbb C$ be arbitrary (possibly repetitive) points of interpolation; $f$ be an analytic function defined on a neighborhood of the convex hull of the union of the spectrum $sigma(A)$ of the matrix $A$ and the points $z_1$, ..., $z_{N}$; and the rational function $r=frac uv$ (with the degree of the numerator $u$ less than $N$) interpolates $f$ at these points (counted according to their multiplicities). Under these assumptions estimates of the kind $$ biglVert f(A)-r(A)bigrVertle max_{tin[0,1];muintext{convex hull}{z_1,z_{2},dots,z_{N}}}bigglVertOmega(A)[v(A)]^{-1} frac{bigl(vfbigr)^{{(N)}} bigl((1-t)mumathbf1+tAbigr)}{N!}biggrVert, $$ where $Omega(z)=prod_{k=1}^N(z-z_k)$, are proposed. As an example illustrating the accuracy of such estimates, an approximation of the impulse response of a dynamic system obtained using the reduced-order Arnoldi method is considered, the actual accuracy of the approximation is compared with the estimate based on this paper.