Imputation of mixed data with multilevel singular value decomposition

61 0 0.0 ( 0 )

Download Cite

Added by Genevieve Robin

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Franc{c}ois Husson

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Statistical analysis of large data sets offers new opportunities to better understand many processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, such data sets often contain mixed data, i.e. both quantitative and qualitative and many missing values. Furthermore, aggregated data present a natural textit{multilevel} structure, where individuals or samples are nested within different sites, such as countries or hospitals. Imputation of multilevel data has therefore drawn some attention recently, but current solutions are not designed to handle mixed data, and suffer from important drawbacks such as their computational cost. In this article, we propose a single imputation method for multilevel data, which can be used to complete either quantitative, categorical or mixed data. The method is based on multilevel singular value decomposition (SVD), which consists in decomposing the variability of the data into two components, the between and within groups variability, and performing SVD on both parts. We show on a simulation study that in comparison to competitors, the method has the great advantages of handling data sets of various size, and being computationally faster. Furthermore, it is the first so far to handle mixed data. We apply the method to impute a medical data set resulting from the aggregation of several data sets coming from different hospitals. This application falls in the framework of a larger project on Trauma patients. To overcome obstacles associated to the aggregation of medical data, we turn to distributed computation. The method is implemented in an R package.

rate research

Tensor Renormalization Group with Randomized Singular Value Decomposition

150 - Satoshi Morita , Ryo Igarashi , Hui-Hai Zhao 2017

An algorithm of the tensor renormalization group is proposed based on a randomized algorithm for singular value decomposition. Our algorithm is applicable to a broad range of two-dimensional classical models. In the case of a square lattice, its computational complexity and memory usage are proportional to the fifth and the third power of the bond dimension, respectively, whereas those of the conventional implementation are of the sixth and the fourth power. The oversampling parameter larger than the bond dimension is sufficient to reproduce the same result as full singular value decomposition even at the critical point of the two-dimensional Ising model.

Statistical Mechanics Computational Physics

Guaranteed Functional Tensor Singular Value Decomposition

152 - Rungang Han , Pixu Shi , Anru R. Zhang 2021

This paper introduces the functional tensor singular value decomposition (FTSVD), a novel dimension reduction framework for tensors with one functional mode and several tabular modes. The problem is motivated by high-order longitudinal data analysis. Our model assumes the observed data to be a random realization of an approximate CP low-rank functional tensor measured on a discrete time grid. Incorporating tensor algebra and the theory of Reproducing Kernel Hilbert Space (RKHS), we propose a novel RKHS-based constrained power iteration with spectral initialization. Our method can successfully estimate both singular vectors and functions of the low-rank structure in the observed data. With mild assumptions, we establish the non-asymptotic contractive error bounds for the proposed algorithm. The superiority of the proposed framework is demonstrated via extensive experiments on both simulated and real data.

Methodology Statistics Theory Applications

Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data

84 - Anru Zhang , Rungang Han 2018

In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named Sparse Tensor Alternating Thresholding for Singular Value Decomposition (STAT-SVD) is proposed. The proposed procedure features a novel double projection & thresholding scheme, which provides a sharp criterion for thresholding in each iteration. Compared with regular tensor SVD model, STAT-SVD permits more robust estimation under weaker assumptions. Both the upper and lower bounds for estimation accuracy are developed. The proposed procedure is shown to be minimax rate-optimal in a general class of situations. Simulation studies show that STAT-SVD performs well under a variety of configurations. We also illustrate the merits of the proposed procedure on a longitudinal tensor dataset on European country mortality rates.

Statistics Theory Methodology Machine Learning

A Randomized Tensor Train Singular Value Decomposition

83 - Benjamin Huber , Reinhold Schneider , Sebastian Wolf 2017

The hierarchical SVD provides a quasi-best low rank approximation of high dimensional data in the hierarchical Tucker framework. Similar to the SVD for matrices, it provides a fundamental but expensive tool for tensor computations. In the present work we examine generalizations of randomized matrix decomposition methods to higher order tensors in the framework of the hierarchical tensors representation. In particular we present and analyze a randomized algorithm for the calculation of the hierarchical SVD (HSVD) for the tensor train (TT) format.

Numerical Analysis

Singular Value Decomposition and Principal Component Analysis

381 - Michael E. Wall 2002

This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.

Biological Physics Data Analysis Statistics and Probability Quantitative Methods