No Arabic abstract
Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.
Quaternion matrix approximation problems construct the approximated matrix via the quaternion singular value decomposition (SVD) by selecting some singular value decomposition (SVD) triplets of quaternion matrices. In applications such as color image processing and recognition problems, only a small number of dominant SVD triplets are selected, while in some applications such as quaternion total least squares problem, small SVD triplets (small singular values and associated singular vectors) and numerical rank with respect to a small threshold are required. In this paper, we propose a randomized quaternion SVD (verbrandsvdQ) method to compute a small number of SVD triplets of a large-scale quaternion matrix. Theoretical results are given about approximation errors and the corresponding algorithm adapts to the low-rank matrix approximation problem. When the restricted rank increases, it might lead to information loss of small SVD triplets. The blocked quaternion randomized SVD algorithm is then developed when the numerical rank and information about small singular values are required. For color face recognition problems, numerical results show good performance of the developed quaternion randomized SVD method for low-rank approximation of a large-scale quaternion matrix. The blocked randomized SVD algorithm is also shown to be more robust than unblocked method through several experiments, and approximation errors from the blocked scheme are very close to the optimal error obtained by truncating a full SVD.
We introduce PyParSVDfootnote{https://github.com/Romit-Maulik/PyParSVD}, a Python library that implements a streaming, distributed and randomized algorithm for the singular value decomposition. To demonstrate its effectiveness, we extract coherent structures from scientific data. Futhermore, we show weak scaling assessments on up to 256 nodes of the Theta machine at Argonne Leadership Computing Facility, demonstrating potential for large-scale data analyses of practical data sets.
In this paper we study randomized optimal stopping problems and consider corresponding forward and backward Monte Carlo based optimisation algorithms. In particular we prove the convergence of the proposed algorithms and derive the corresponding convergence rates.
The hierarchical SVD provides a quasi-best low rank approximation of high dimensional data in the hierarchical Tucker framework. Similar to the SVD for matrices, it provides a fundamental but expensive tool for tensor computations. In the present work we examine generalizations of randomized matrix decomposition methods to higher order tensors in the framework of the hierarchical tensors representation. In particular we present and analyze a randomized algorithm for the calculation of the hierarchical SVD (HSVD) for the tensor train (TT) format.