No Arabic abstract
This paper considers the problem of matrix-variate logistic regression. The fundamental error threshold on estimating coefficient matrices in the logistic regression problem is found by deriving a lower bound on the minimax risk. The focus of this paper is on derivation of a minimax risk lower bound for low-rank coefficient matrices. The bound depends explicitly on the dimensions and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. The resulting bound is proportional to the intrinsic degrees of freedom in the problem, which suggests the sample complexity of the low-rank matrix logistic regression problem can be lower than that for vectorized logistic regression. color{red}color{black} The proof techniques utilized in this work also set the stage for development of minimax lower bounds for tensor-variate logistic regression problems.
This paper studies a tensor-structured linear regression model with a scalar response variable and tensor-structured predictors, such that the regression parameters form a tensor of order $d$ (i.e., a $d$-fold multiway array) in $mathbb{R}^{n_1 times n_2 times cdots times n_d}$. It focuses on the task of estimating the regression tensor from $m$ realizations of the response variable and the predictors where $mll n = prod olimits_{i} n_i$. Despite the seeming ill-posedness of this problem, it can still be solved if the parameter tensor belongs to the space of sparse, low Tucker-rank tensors. Accordingly, the estimation procedure is posed as a non-convex optimization program over the space of sparse, low Tucker-rank tensors, and a tensor variant of projected gradient descent is proposed to solve the resulting non-convex problem. In addition, mathematical guarantees are provided that establish the proposed method linearly converges to an appropriate solution under a certain set of conditions. Further, an upper bound on sample complexity of tensor parameter estimation for the model under consideration is characterized for the special case when the individual (scalar) predictors independently draw values from a sub-Gaussian distribution. The sample complexity bound is shown to have a polylogarithmic dependence on $bar{n} = max big{n_i: iin {1,2,ldots,d } big}$ and, orderwise, it matches the bound one can obtain from a heuristic parameter counting argument. Finally, numerical experiments demonstrate the efficacy of the proposed tensor model and estimation method on a synthetic dataset and a collection of neuroimaging datasets pertaining to attention deficit hyperactivity disorder. Specifically, the proposed method exhibits better sample complexities on both synthetic and real datasets, demonstrating the usefulness of the model and the method in settings where $n gg m$.
For even $k$, the matchings connectivity matrix $mathbf{M}_k$ encodes which pairs of perfect matchings on $k$ vertices form a single cycle. Cygan et al. (STOC 2013) showed that the rank of $mathbf{M}_k$ over $mathbb{Z}_2$ is $Theta(sqrt 2^k)$ and used this to give an $O^*((2+sqrt{2})^{mathsf{pw}})$ time algorithm for counting Hamiltonian cycles modulo $2$ on graphs of pathwidth $mathsf{pw}$. The same authors complemented their algorithm by an essentially tight lower bound under the Strong Exponential Time Hypothesis (SETH). This bound crucially relied on a large permutation submatrix within $mathbf{M}_k$, which enabled a pattern propagation commonly used in previous related lower bounds, as initiated by Lokshtanov et al. (SODA 2011). We present a new technique for a similar pattern propagation when only a black-box lower bound on the asymptotic rank of $mathbf{M}_k$ is given; no stronger structural insights such as the existence of large permutation submatrices in $mathbf{M}_k$ are needed. Given appropriate rank bounds, our technique yields lower bounds for counting Hamiltonian cycles (also modulo fixed primes $p$) parameterized by pathwidth. To apply this technique, we prove that the rank of $mathbf{M}_k$ over the rationals is $4^k / mathrm{poly}(k)$. We also show that the rank of $mathbf{M}_k$ over $mathbb{Z}_p$ is $Omega(1.97^k)$ for any prime $p eq 2$ and even $Omega(2.15^k)$ for some primes. As a consequence, we obtain that Hamiltonian cycles cannot be counted in time $O^*((6-epsilon)^{mathsf{pw}})$ for any $epsilon>0$ unless SETH fails. This bound is tight due to a $O^*(6^{mathsf{pw}})$ time algorithm by Bodlaender et al. (ICALP 2013). Under SETH, we also obtain that Hamiltonian cycles cannot be counted modulo primes $p eq 2$ in time $O^*(3.97^mathsf{pw})$, indicating that the modulus can affect the complexity in intricate ways.
We propose a new Riemannian geometry for fixed-rank matrices that is specifically tailored to the low-rank matrix completion problem. Exploiting the degree of freedom of a quotient space, we tune the metric on our search space to the particular least square cost function. At one level, it illustrates in a novel way how to exploit the versatile framework of optimization on quotient manifold. At another level, our algorithm can be considered as an improved version of LMaFit, the state-of-the-art Gauss-Seidel algorithm. We develop necessary tools needed to perform both first-order and second-order optimization. In particular, we propose gradient descent schemes (steepest descent and conjugate gradient) and trust-region algorithms. We also show that, thanks to the simplicity of the cost function, it is numerically cheap to perform an exact linesearch given a search direction, which makes our algorithms competitive with the state-of-the-art on standard low-rank matrix completion instances.
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank component across all tasks, but allows an individual per-task right low-rank component. This dramatically reduces the number of samples needed for accurate estimation. The problem of jointly recovering the common and the local components has a non-convex bi-linear structure. We overcome this hurdle and provide a provably beneficial non-iterative spectral algorithm. Appealingly, the solution has favorable behavior as a function of the number of related tasks and the small number of samples available for each one. We demonstrate the efficacy of our approach for the challenging task of remote river discharge estimation across multiple river sites, where data for each task is naturally scarce. In this scenario sharing a low-rank component between the tasks translates to a shared spectral reflection of the water, which is a true underlying physical model. We also show the benefit of the approach on the markedly different setting of image classification where the common component can be interpreted as the shared convolution filters.
Matrix completion is a modern missing data problem where both the missing structure and the underlying parameter are high dimensional. Although missing structure is a key component to any missing data problems, existing matrix completion methods often assume a simple uniform missing mechanism. In this work, we study matrix completion from corrupted data under a novel low-rank missing mechanism. The probability matrix of observation is estimated via a high dimensional low-rank matrix estimation procedure, and further used to complete the target matrix via inverse probabilities weighting. Due to both high dimensional and extreme (i.e., very small) nature of the true probability matrix, the effect of inverse probability weighting requires careful study. We derive optimal asymptotic convergence rates of the proposed estimators for both the observation probabilities and the target matrix.