No Arabic abstract
We present simulated standard curves for the calibration of empirical likelihood ratio (ELR) tests of means. With the help of these curves, the nominal significance level of the ELR test can be adjusted in order to achieve (quasi-) exact type I error rate control for a given, finite sample size. By theoretical considerations and by computer simulations, we demonstrate that the adjusted significance level depends most crucially on the skewness and on the kurtosis of the parent distribution. For practical purposes, we tabulate adjusted critical values under several prototypical statistical models.
High-dimensional statistical inference with general estimating equations are challenging and remain less explored. In this paper, we study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications test. For the first one, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. For the second one, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. The numerical studies demonstrate promising performance and potential practical benefits of the new methods.
The likelihood ratio test (LRT) based on the asymptotic chi-squared distribution of the log likelihood is one of the fundamental tools of statistical inference. A recent universal LRT approach based on sample splitting provides valid hypothesis tests and confidence sets in any setting for which we can compute the split likelihood ratio statistic (or, more generally, an upper bound on the null maximum likelihood). The universal LRT is valid in finite samples and without regularity conditions. This test empowers statisticians to construct tests in settings for which no valid hypothesis test previously existed. For the simple but fundamental case of testing the population mean of d-dimensional Gaussian data, the usual LRT itself applies and thus serves as a perfect test bed to compare against the universal LRT. This work presents the first in-depth exploration of the size, power, and relationships between several universal LRT variants. We show that a repeated subsampling approach is the best choice in terms of size and power. We observe reasonable performance even in a high-dimensional setting, where the expected squared radius of the best universal LRT confidence set is approximately 3/2 times the squared radius of the standard LRT-based set. We illustrate the benefits of the universal LRT through testing a non-convex doughnut-shaped null hypothesis, where a universal inference procedure can have higher power than a standard approach.
The complexity underlying real-world systems implies that standard statistical hypothesis testing methods may not be adequate for these peculiar applications. Specifically, we show that the likelihood-ratio tests null-distribution needs to be modified to accommodate the complexity found in multi-edge network data. When working with independent observations, the p-values of likelihood-ratio tests are approximated using a $chi^2$ distribution. However, such an approximation should not be used when dealing with multi-edge network data. This type of data is characterized by multiple correlations and competitions that make the standard approximation unsuitable. We provide a solution to the problem by providing a better approximation of the likelihood-ratio test null-distribution through a Beta distribution. Finally, we empirically show that even for a small multi-edge network, the standard $chi^2$ approximation provides erroneous results, while the proposed Beta approximation yields the correct p-value estimation.
The celebrated Bernstein von-Mises theorem ensures that credible regions from Bayesian posterior are well-calibrated when the model is correctly-specified, in the frequentist sense that their coverage probabilities tend to the nominal values as data accrue. However, this conventional Bayesian framework is known to lack robustness when the model is misspecified or only partly specified, such as in quantile regression, risk minimization based supervised/unsupervised learning and robust estimation. To overcome this difficulty, we propose a new Bayesian inferential approach that substitutes the (misspecified or partly specified) likelihoods with proper exponentially tilted empirical likelihoods plus a regularization term. Our surrogate empirical likelihood is carefully constructed by using the first order optimality condition of the empirical risk minimization as the moment condition. We show that the Bayesian posterior obtained by combining this surrogate empirical likelihood and the prior is asymptotically close to a normal distribution centering at the empirical risk minimizer with covariance matrix taking an appropriate sandwiched form. Consequently, the resulting Bayesian credible regions are automatically calibrated to deliver valid uncertainty quantification. Computationally, the proposed method can be easily implemented by Markov Chain Monte Carlo sampling algorithms. Our numerical results show that the proposed method tends to be more accurate than existing state-of-the-art competitors.
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.