ﻻ يوجد ملخص باللغة العربية
Estimating the rank of a corrupted data matrix is an important task in data science, most notably for choosing the number of components in principal component analysis. Significant progress on this task has been made using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, such as Poisson or binomial, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, focusing on a Poisson random matrix with independent entries, we propose a simple procedure termed textit{biwhitening} that makes it possible to estimate the rank of the underlying data matrix (i.e., the Poisson parameter matrix) without any prior knowledge on its structure. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson distribution, we extend our biwhitening approach to other discrete distributions, such as the generalized Poisson, binomial, multinomial, and negative binomial. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate our approach on real single-cell RNA sequencing (scRNA-seq) data, where we show that our results agree with a slightly overdispersed generalized Poisson model.
In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by a different generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability
We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed
We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the compositi
In this paper we prove the concavity of the $k$-trace functions, $Amapsto (text{Tr}_k[exp(H+ln A)])^{1/k}$, on the convex cone of all positive definite matrices. $text{Tr}_k[A]$ denotes the $k_{mathrm{th}}$ elementary symmetric polynomial of the eige
The task of reconstructing a matrix given a sample of observedentries is known as the matrix completion problem. It arises ina wide range of problems, including recommender systems, collaborativefiltering, dimensionality reduction, image processing,