ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimal Linear Shrinkage Estimator for Large Dimensional Precision Matrix

204   0   0.0 ( 0 )
 نشر من قبل Nestor Parolya Dr.
 تاريخ النشر 2013
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

In this work we construct an optimal shrinkage estimator for the precision matrix in high dimensions. We consider the general asymptotics when the number of variables $prightarrowinfty$ and the sample size $nrightarrowinfty$ so that $p/nrightarrow cin (0, +infty)$. The precision matrix is estimated directly, without inverting the corresponding estimator for the covariance matrix. The recent results from the random matrix theory allow us to find the asymptotic deterministic equivalents of the optimal shrinkage intensities and estimate them consistently. The resulting distribution-free estimator has almost surely the minimum Frobenius loss. Additionally, we prove that the Frobenius norms of the inverse and of the pseudo-inverse sample covariance matrices tend almost surely to deterministic quantities and estimate them consistently. At the end, a simulation is provided where the suggested estimator is compared with the estimators for the precision matrix proposed in the literature. The optimal shrinkage estimator shows significant improvement and robustness even for non-normally distributed data.



قيم البحث

اقرأ أيضاً

Motivated by establishing theoretical foundations for various manifold learning algorithms, we study the problem of Mahalanobis distance (MD), and the associated precision matrix, estimation from high-dimensional noisy data. By relying on recent tran sformative results in covariance matrix estimation, we demonstrate the sensitivity of MD~and the associated precision matrix to measurement noise, determining the exact asymptotic signal-to-noise ratio at which MD fails, and quantifying its performance otherwise. In addition, for an appropriate loss function, we propose an asymptotically optimal shrinker, which is shown to be beneficial over the classical implementation of the MD, both analytically and in simulations. The result is extended to the manifold setup, where the nonlinear interaction between curvature and high-dimensional noise is taken care of. The developed solution is applied to study a multiscale reduction problem in the dynamical system analysis.
In this paper new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for th e hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose we study the weak convergence of linear spectral statistics of central and (conditionally) non-central Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) non-central Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.
Consider a random vector $mathbf{y}=mathbf{Sigma}^{1/2}mathbf{x}$, where the $p$ elements of the vector $mathbf{x}$ are i.i.d. real-valued random variables with zero mean and finite fourth moment, and $mathbf{Sigma}^{1/2}$ is a deterministic $ptimes p$ matrix such that the spectral norm of the population correlation matrix $mathbf{R}$ of $mathbf{y}$ is uniformly bounded. In this paper, we find that the log determinant of the sample correlation matrix $hat{mathbf{R}}$ based on a sample of size $n$ from the distribution of $mathbf{y}$ satisfies a CLT (central limit theorem) for $p/nto gammain (0, 1]$ and $pleq n$. Explicit formulas for the asymptotic mean and variance are provided. In case the mean of $mathbf{y}$ is unknown, we show that after recentering by the empirical mean the obtained CLT holds with a shift in the asymptotic mean. This result is of independent interest in both large dimensional random matrix theory and high-dimensional statistical literature of large sample correlation matrices for non-normal data. At last, the obtained findings are applied for testing of uncorrelatedness of $p$ random variables. Surprisingly, in the null case $mathbf{R}=mathbf{I}$, the test statistic becomes completely pivotal and the extensive simulations show that the obtained CLT also holds if the moments of order four do not exist at all, which conjectures a promising and robust test statistic for heavy-tailed high-dimensional data.
190 - Zeyu Wu , Cheng Wang , Weidong Liu 2021
In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.
We prove the consistency of the Power-Law Fit PLFit method proposed by Clauset et al.(2009) to estimate the power-law exponent in data coming from a distribution function with regularly-varying tail. In the complex systems community, PLFit has emerge d as the method of choice to estimate the power-law exponent. Yet, its mathematical properties are still poorly understood. The difficulty in PLFit is that it is a minimum-distance estimator. It first chooses a threshold that minimizes the Kolmogorov-Smirnov distance between the data points larger than the threshold and the Pareto tail, and then applies the Hill estimator to this restricted data. Since the number of order statistics used is random, the general theory of consistency of power-law exponents from extreme value theory does not apply. Our proof consists in first showing that the Hill estimator is consistent for general intermediate sequences for the number of order statistics used, even when that number is random. Here, we call a sequence intermediate when it grows to infinity, while remaining much smaller than the sample size. The second, and most involved, step is to prove that the optimizer in PLFit is with high probability an intermediate sequence, unless the distribution has a Pareto tail above a certain value. For the latter special case, we give a separate proof.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا