No Arabic abstract
This paper considers the problem of recovery of a low-rank matrix in the situation when most of its entries are not observed and a fraction of observed entries are corrupted. The observations are noisy realizations of the sum of a low rank matrix, which we wish to recover, with a second matrix having a complementary sparse structure such as element-wise or column-wise sparsity. We analyze a class of estimators obtained by solving a constrained convex optimization problem that combines the nuclear norm and a convex relaxation for a sparse constraint. Our results are obtained for the simultaneous presence of random and deterministic patterns in the sampling scheme. We provide guarantees for recovery of low-rank and sparse components from partial and corrupted observations in the presence of noise and show that the obtained rates of convergence are minimax optimal.
This paper examines the properties of real symmetric square matrices with a constant value for the main diagonal elements and another constant value for all off-diagonal elements. This matrix form is a simple subclass of circulant matrices, which is a subclass of Toeplitz matrices. It encompasses other useful matrices such as the centering matrix and the equicorrelation matrix, which arise in statistical applications. We examine the general form of this class of matrices and derive its eigendecomposition and other important properties. We use this as a basis to look at the properties of the centering matrix and the equicorrelation matrix, and various statistics that use these matrices.
The task of reconstructing a matrix given a sample of observedentries is known as the matrix completion problem. It arises ina wide range of problems, including recommender systems, collaborativefiltering, dimensionality reduction, image processing, quantum physics or multi-class classificationto name a few. Most works have focused on recovering an unknown real-valued low-rankmatrix from randomly sub-sampling its entries.Here, we investigate the case where the observations take a finite number of values, corresponding for examples to ratings in recommender systems or labels in multi-class classification.We also consider a general sampling scheme (not necessarily uniform) over the matrix entries.The performance of a nuclear-norm penalized estimator is analyzed theoretically.More precisely, we derive bounds for the Kullback-Leibler divergence between the true and estimated distributions.In practice, we have also proposed an efficient algorithm based on lifted coordinate gradient descent in order to tacklepotentially high dimensional settings.
In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the missing elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.
We consider the matrix completion problem of recovering a structured low rank matrix with partially observed entries with mixed data types. Vast majority of the solutions have proposed computationally feasible estimators with strong statistical guarantees for the case where the underlying distribution of data in the matrix is continuous. A few recent approaches have extended using similar ideas these estimators to the case where the underlying distributions belongs to the exponential family. Most of these approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm. We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step. We also provide extensive simulation evidence that corroborate our theoretical results.
Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P=Q. On the other hand, when comparing or testing particular parameters $theta$ of P and Q, such as their means or medians, permutation tests need not be level $alpha$, or even approximately level $alpha$ in large samples. Under very weak assumptions for comparing estimators, we provide a general test procedure whereby the asymptotic validity of the permutation test holds while retaining the exact rejection probability $alpha$ in finite samples when the underlying distributions are identical. The ideas are broadly applicable and special attention is given to the k-sample problem of comparing general parameters, whereby a permutation test is constructed which is exact level $alpha$ under the hypothesis of identical distributions, but has asymptotic rejection probability $alpha$ under the more general null hypothesis of equality of parameters. A Monte Carlo simulation study is performed as well. A quite general theory is possible based on a coupling construction, as well as a key contiguity argument for the multinomial and multivariate hypergeometric distributions.