No Arabic abstract
The trace regression model, a direct extension of the well-studied linear regression model, allows one to map matrices to real-valued outputs. We here introduce an even more general model, namely the partial-trace regression model, a family of linear mappings from matrix-valued inputs to matrix-valued outputs; this model subsumes the trace regression model and thus the linear regression model. Borrowing tools from quantum information theory, where partial trace operators have been extensively studied, we propose a framework for learning partial trace regression models from data by taking advantage of the so-called low-rank Kraus representation of completely positive maps. We show the relevance of our framework with synthetic and real-world experiments conducted for both i) matrix-to-matrix regression and ii) positive semidefinite matrix completion, two tasks which can be formulated as partial trace regression problems.
A growing number of modern statistical learning problems involve estimating a large number of parameters from a (smaller) number of noisy observations. In a subset of these problems (matrix completion, matrix compressed sensing, and multi-task learning) the unknown parameters form a high-dimensional matrix B*, and two popular approaches for the estimation are convex relaxation of rank-penalized regression or non-convex optimization. It is also known that these estimators satisfy near optimal error bounds under assumptions on rank, coherence, or spikiness of the unknown matrix. In this paper, we introduce a unifying technique for analyzing all of these problems via both estimators that leads to short proofs for the existing results as well as new results. Specifically, first we introduce a general notion of spikiness for B* and consider a general family of estimators and prove non-asymptotic error bounds for the their estimation error. Our approach relies on a generic recipe to prove restricted strong convexity for the sampling operator of the trace regression. Second, and most notably, we prove similar error bounds when the regularization parameter is chosen via K-fold cross-validation. This result is significant in that existing theory on cross-validated estimators do not apply to our setting since our estimators are not known to satisfy their required notion of stability. Third, we study applications of our general results to four subproblems of (1) matrix completion, (2) multi-task learning, (3) compressed sensing with Gaussian ensembles, and (4) compressed sensing with factored measurements. For (1), (3), and (4) we recover matching error bounds as those found in the literature, and for (2) we obtain (to the best of our knowledge) the first such error bound. We also demonstrate how our frameworks applies to the exact recovery problem in (3) and (4).
Tensor completion refers to the task of estimating the missing data from an incomplete measurement or observation, which is a core problem frequently arising from the areas of big data analysis, computer vision, and network engineering. Due to the multidimensional nature of high-order tensors, the matrix approaches, e.g., matrix factorization and direct matricization of tensors, are often not ideal for tensor completion and recovery. In this paper, we introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion. Our model possesses a sparse regularization term to promote a sparse core tensor of the Tucker decomposition, which is beneficial for tensor data compression. Moreover, we enforce low-rank regularization terms on factor matrices of the Tucker decomposition for inducing the low-rankness of the tensor with a cheap computational cost. Numerically, we propose a customized ADMM with enough easy subproblems to solve the underlying model. It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors. A series of computational experiments on real-world data sets, including internet traffic data sets, color images, and face recognition, demonstrate that our model performs better than many existing state-of-the-art matricization and tensorization approaches in terms of achieving higher recovery accuracy.
We propose a sparse and low-rank tensor regression model to relate a univariate outcome to a feature tensor, in which each unit-rank tensor from the CP decomposition of the coefficient tensor is assumed to be sparse. This structure is both parsimonious and highly interpretable, as it implies that the outcome is related to the features through a few distinct pathways, each of which may only involve subsets of feature dimensions. We take a divide-and-conquer strategy to simplify the task into a set of sparse unit-rank tensor regression problems. To make the computation efficient and scalable, for the unit-rank tensor regression, we propose a stagewise estimation procedure to efficiently trace out its entire solution path. We show that as the step size goes to zero, the stagewise solution paths converge exactly to those of the corresponding regularized regression. The superior performance of our approach is demonstrated on various real-world and synthetic examples.
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank component across all tasks, but allows an individual per-task right low-rank component. This dramatically reduces the number of samples needed for accurate estimation. The problem of jointly recovering the common and the local components has a non-convex bi-linear structure. We overcome this hurdle and provide a provably beneficial non-iterative spectral algorithm. Appealingly, the solution has favorable behavior as a function of the number of related tasks and the small number of samples available for each one. We demonstrate the efficacy of our approach for the challenging task of remote river discharge estimation across multiple river sites, where data for each task is naturally scarce. In this scenario sharing a low-rank component between the tasks translates to a shared spectral reflection of the water, which is a true underlying physical model. We also show the benefit of the approach on the markedly different setting of image classification where the common component can be interpreted as the shared convolution filters.
This paper studies a tensor-structured linear regression model with a scalar response variable and tensor-structured predictors, such that the regression parameters form a tensor of order $d$ (i.e., a $d$-fold multiway array) in $mathbb{R}^{n_1 times n_2 times cdots times n_d}$. It focuses on the task of estimating the regression tensor from $m$ realizations of the response variable and the predictors where $mll n = prod olimits_{i} n_i$. Despite the seeming ill-posedness of this problem, it can still be solved if the parameter tensor belongs to the space of sparse, low Tucker-rank tensors. Accordingly, the estimation procedure is posed as a non-convex optimization program over the space of sparse, low Tucker-rank tensors, and a tensor variant of projected gradient descent is proposed to solve the resulting non-convex problem. In addition, mathematical guarantees are provided that establish the proposed method linearly converges to an appropriate solution under a certain set of conditions. Further, an upper bound on sample complexity of tensor parameter estimation for the model under consideration is characterized for the special case when the individual (scalar) predictors independently draw values from a sub-Gaussian distribution. The sample complexity bound is shown to have a polylogarithmic dependence on $bar{n} = max big{n_i: iin {1,2,ldots,d } big}$ and, orderwise, it matches the bound one can obtain from a heuristic parameter counting argument. Finally, numerical experiments demonstrate the efficacy of the proposed tensor model and estimation method on a synthetic dataset and a collection of neuroimaging datasets pertaining to attention deficit hyperactivity disorder. Specifically, the proposed method exhibits better sample complexities on both synthetic and real datasets, demonstrating the usefulness of the model and the method in settings where $n gg m$.