ترغب بنشر مسار تعليمي؟ اضغط هنا

Limiting laws and consistent estimation criteria for fixed and diverging number of spiked eigenvalues

66   0   0.0 ( 0 )
 نشر من قبل Emma Jingfei Zhang
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we study limiting laws and consistent estimation criteria for the extreme eigenvalues in a spiked covariance model of dimension $p$. Firstly, for fixed $p$, we propose a generalized estimation criterion that can consistently estimate, $k$, the number of spiked eigenvalues. Compared with the existing literature, we show that consistency can be achieved under weaker conditions on the penalty term. Next, allowing both $p$ and $k$ to diverge, we derive limiting distributions of the spiked sample eigenvalues using random matrix theory techniques. Notably, our results do not require the spiked eigenvalues to be uniformly bounded from above or tending to infinity, as have been assumed in the existing literature. Based on the above derived results, we formulate a generalized estimation criterion and show that it can consistently estimate $k$, while $k$ can be fixed or grow at an order of $k=o(n^{1/3})$. We further show that the results in our work continue to hold under a general population distribution without assuming normality. The efficacy of the proposed estimation criteria is illustrated through comparative simulation studies.

قيم البحث

اقرأ أيضاً

We study the asymptotic distributions of the spiked eigenvalues and the largest nonspiked eigenvalue of the sample covariance matrix under a general covariance matrix model with divergent spiked eigenvalues, while the other eigenvalues are bounded bu t otherwise arbitrary. The limiting normal distribution for the spiked sample eigenvalues is established. It has distinct features that the asymptotic mean relies on not only the population spikes but also the nonspikes and that the asymptotic variance in general depends on the population eigenvectors. In addition, the limiting Tracy-Widom law for the largest nonspiked sample eigenvalue is obtained. Estimation of the number of spikes and the convergence of the leading eigenvectors are also considered. The results hold even when the number of the spikes diverges. As a key technical tool, we develop a Central Limit Theorem for a type of random quadratic forms where the random vectors and random matrices involved are dependent. This result can be of independent interest.
106 - Yuhao Wang , Xinran Li 2021
Completely randomized experiments have been the gold standard for drawing causal inference because they can balance all potential confounding on average. However, they can often suffer from unbalanced covariates for realized treatment assignments. Re randomization, a design that rerandomizes the treatment assignment until a prespecified covariate balance criterion is met, has recently got attention due to its easy implementation, improved covariate balance and more efficient inference. Researchers have then suggested to use the assignments that minimize the covariate imbalance, namely the optimally balanced design. This has caused again the long-time controversy between two philosophies for designing experiments: randomization versus optimal and thus almost deterministic designs. Existing literature argued that rerandomization with overly balanced observed covariates can lead to highly imbalanced unobserved covariates, making it vulnerable to model misspecification. On the contrary, rerandomization with properly balanced covariates can provide robust inference for treatment effects while sacrificing some efficiency compared to the ideally optimal design. In this paper, we show it is possible that, by making the covariate imbalance diminishing at a proper rate as the sample size increases, rerandomization can achieve its ideally optimal precision that one can expect with perfectly balanced covariates while still maintaining its robustness. In particular, we provide the sufficient and necessary condition on the number of covariates for achieving the desired optimality. Our results rely on a more dedicated asymptotic analysis for rerandomization. The derived theory for rerandomization provides a deeper understanding of its large-sample property and can better guide its practical implementation. Furthermore, it also helps reconcile the controversy between randomized and optimal designs.
153 - Yejiong Zhu , Hao Chen 2021
Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional data and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse graphs , such as graphs with the number of edges in the order of the number of observations. However, the tests have better performance with denser graphs under many settings. In this work, we establish the theoretical ground for graph-based tests with graphs that are much denser than those in existing works.
130 - Xiao Fang , David Siegmund 2020
We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes serious loss of power. Examples involving temperature variations, the level of atmospheric greenhouse gases, suicide rates and daily incidence of COVID-19 illustrate the general theory.
Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to eva luate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relations. One reason conditional mutual information is not more widely used for these tasks is the lack of estimators which can handle combinations of continuous and discrete random variables, common in applications. This paper develops a new method for estimating mutual and conditional mutual information for data samples containing a mix of discrete and continuous variables. We prove that this estimator is consistent and show, via simulation, that it is more accurate than similar estimators.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا