No Arabic abstract
Motivated by establishing theoretical foundations for various manifold learning algorithms, we study the problem of Mahalanobis distance (MD), and the associated precision matrix, estimation from high-dimensional noisy data. By relying on recent transformative results in covariance matrix estimation, we demonstrate the sensitivity of MD~and the associated precision matrix to measurement noise, determining the exact asymptotic signal-to-noise ratio at which MD fails, and quantifying its performance otherwise. In addition, for an appropriate loss function, we propose an asymptotically optimal shrinker, which is shown to be beneficial over the classical implementation of the MD, both analytically and in simulations. The result is extended to the manifold setup, where the nonlinear interaction between curvature and high-dimensional noise is taken care of. The developed solution is applied to study a multiscale reduction problem in the dynamical system analysis.
We consider high-dimensional measurement errors with high-frequency data. Our focus is on recovering the covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding. By developing a new technical device integrating the high-frequency data feature with the conventional notion of $alpha$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions. We then establish cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis.
In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.
Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesis of independence between the two random vectors when only the sample size or the dimensionality diverges. Yet its asymptotic null distribution for the more realistic setting when both sample size and dimensionality diverge in the full range remains largely underdeveloped. In this paper, we fill such a gap and develop central limit theorems and associated rates of convergence for a rescaled test statistic based on the bias-corrected distance correlation in high dimensions under some mild regularity conditions and the null hypothesis. Our new theoretical results reveal an interesting phenomenon of blessing of dimensionality for high-dimensional distance correlation inference in the sense that the accuracy of normal approximation can increase with dimensionality. Moreover, we provide a general theory on the power analysis under the alternative hypothesis of dependence, and further justify the capability of the rescaled distance correlation in capturing the pure nonlinear dependency under moderately high dimensionality for a certain type of alternative hypothesis. The theoretical results and finite-sample performance of the rescaled statistic are illustrated with several simulation examples and a blockchain application.
In this work we construct an optimal shrinkage estimator for the precision matrix in high dimensions. We consider the general asymptotics when the number of variables $prightarrowinfty$ and the sample size $nrightarrowinfty$ so that $p/nrightarrow cin (0, +infty)$. The precision matrix is estimated directly, without inverting the corresponding estimator for the covariance matrix. The recent results from the random matrix theory allow us to find the asymptotic deterministic equivalents of the optimal shrinkage intensities and estimate them consistently. The resulting distribution-free estimator has almost surely the minimum Frobenius loss. Additionally, we prove that the Frobenius norms of the inverse and of the pseudo-inverse sample covariance matrices tend almost surely to deterministic quantities and estimate them consistently. At the end, a simulation is provided where the suggested estimator is compared with the estimators for the precision matrix proposed in the literature. The optimal shrinkage estimator shows significant improvement and robustness even for non-normally distributed data.
The purpose of this thesis is to develop new theories on high-dimensional structured signal recovery under a rather weak assumption on the measurements that only a finite number of moments exists. High-dimensional recovery has been one of the emerging topics in the last decade partly due to the celebrated work of Candes, Romberg and Tao (e.g. [CRT06, CRT04]). The original analysis there (and the works thereafter) necessitates a strong concentration argument (namely, the restricted isometry property), which only holds for a rather restricted class of measurements with light-tailed distributions. It had long been conjectured that high-dimensional recovery is possible even if restricted isometry type conditions do not hold, but the general theory was beyond the grasp until very recently, when the works [Men14a, KM15] propose a new small-ball method. In these two papers, the authors initiated a new analysis framework for general empirical risk minimization (ERM) problems with respect to the square loss, which is robust and can potentially allow heavy-tailed loss functions. The materials in this thesis are partly inspired by [Men14a], but are of a different mindset: rather than directly analyzing the existing ERMs for signal recovery for which it is difficult to avoid strong moment assumptions, we show that, in many circumstances, by carefully re-designing the ERMs to start with, one can still achieve the minimax optimal statistical rate of signal recovery with very high probability under much weaker assumptions than existing works.