ترغب بنشر مسار تعليمي؟ اضغط هنا

New challenges in covariance estimation: multiple structures and coarse quantization

104   0   0.0 ( 0 )
 نشر من قبل Johannes Maly
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In this self-contained chapter, we revisit a fundamental problem of multivariate statistics: estimating covariance matrices from finitely many independent samples. Based on massive Multiple-Input Multiple-Output (MIMO) systems we illustrate the necessity of leveraging structure and considering quantization of samples when estimating covariance matrices in practice. We then provide a selective survey of theoretical advances of the last decade focusing on the estimation of structured covariance matrices. This review is spiced up by some yet unpublished insights on how to benefit from combined structural constraints. Finally, we summarize the findings of our recently published preprint Covariance estimation under one-bit quantization to show how guaranteed covariance estimation is possible even under coarse quantization of the samples.



قيم البحث

اقرأ أيضاً

427 - Clifford Lam , Jianqing Fan 2009
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that ar e zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order $(s_nlog p_n/n)^{1/2}$, where $s_n$ is the number of nonzero elements, $p_n$ is the size of the covariance matrix and $n$ is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter $lambda_n$ goes to 0 have been made explicit and compared under different penalties. As a result, for the $L_1$-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: $s_n=O(p_n)$ at most, among $O(p_n^2)$ parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where $s_n$ is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
We consider the problem of estimating a low rank covariance function $K(t,u)$ of a Gaussian process $S(t), tin [0,1]$ based on $n$ i.i.d. copies of $S$ observed in a white noise. We suggest a new estimation procedure adapting simultaneously to the lo w rank structure and the smoothness of the covariance function. The new procedure is based on nuclear norm penalization and exhibits superior performances as compared to the sample covariance function by a polynomial factor in the sample size $n$. Other results include a minimax lower bound for estimation of low-rank covariance functions showing that our procedure is optimal as well as a scheme to estimate the unknown noise variance of the Gaussian process.
Fan et al. [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{47}$(6) (2019) 3009-3031] proposed a distributed principal component analysis (PCA) algorithm to significantly reduce the communication cost between multiple servers. In this pape r, we robustify their distributed algorithm by using robust covariance matrix estimators respectively proposed by Minsker [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$mathit{Statistical}$ $mathit{Science}$ $textbf{34}$(3) (2019) 454-471] instead of the sample covariance matrix. We extend the deviation bound of robust covariance estimators with bounded fourth moments to the case of the heavy-tailed distribution under only bounded $2+epsilon$ moments assumption. The theoretical results show that after the shrinkage or truncation treatment for the sample covariance matrix, the statistical error rate of the final estimator produced by the robust algorithm is the same as that of sub-Gaussian tails, when $epsilon geq 2$ and the sampling distribution is symmetric innovation. While $2 > epsilon >0$, the rate with respect to the sample size of each server is slower than that of the bounded fourth moment assumption. Extensive numerical results support the theoretical analysis, and indicate that the algorithm performs better than the original distributed algorithm and is robust to heavy-tailed data and outliers.
In this paper we study covariance estimation with missing data. We consider missing data mechanisms that can be independent of the data, or have a time varying dependency. Additionally, observed variables may have arbitrary (non uniform) and dependen t observation probabilities. For each mechanism, we construct an unbiased estimator and obtain bounds for the expected value of their estimation error in operator norm. Our bounds are equivalent, up to constant and logarithmic factors, to state of the art bounds for complete and uniform missing observations. Furthermore, for the more general non uniform and dependent cases, the proposed bounds are new or improve upon previous results. Our error estimates depend on quantities we call scaled effective rank, which generalize the effective rank to account for missing observations. All the estimators studied in this work have the same asymptotic convergence rate (up to logarithmic factors).
We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices c orresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal and ..what are the other possible strategies to deal with heavy tailed distributions warrant further studies. We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the intrinsic dimension associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا