Do you want to publish a course? Click here

Large-scale simultaneous inference under dependence

111   0   0.0 ( 0 )
 Added by Jinjin Tian
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Simultaneous, post-hoc inference is desirable in large-scale hypotheses testing as it allows for exploration of data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible post-hoc inference methods for the number of true discoveries must be based on closed testing. In this paper we investigate tractable and efficient closed testing with local tests of different properties, such as monotonicty, symmetry and separability, meaning that the test thresholds a monotonic or symmetric function or a function of sums of test scores for the individual hypotheses. This class includes well-known global null tests by Fisher, Stouffer and Ruschendorf, as well as newly proposed ones based on harmonic means and Cauchy combinations. Under monotonicity, we propose a new linear time statistic (coma) that quantifies the cost of multiplicity adjustments. If the tests are also symmetric and separable, we develop several fast (mostly linear-time) algorithms for post-hoc inference, making closed testing tractable. Paired with recent advances in global null tests based on generalized means, our work immediately instantiates a series of simultaneous inference methods that can handle many complex dependence structures and signal compositions. We provide guidance on choosing from these methods via theoretical investigation of the conservativeness and sensitivity for different local tests, as well as simulations that find analogous behavior for local tests and full closed testing. One result of independent interest is the following: if $P_1,dots,P_d$ are $p$-values from a multivariate Gaussian with arbitrary covariance, then their arithmetic average P satisfies $Pr(P leq t) leq t$ for $t leq frac{1}{2d}$.



rate research

Read More

103 - Meng Yuan , Pengfei Li , 2021
The density ratio model (DRM) provides a flexible and useful platform for combining information from multiple sources. In this paper, we consider statistical inference under two-sample DRMs with additional parameters defined through and/or additional auxiliary information expressed as estimating equations. We examine the asymptotic properties of the maximum empirical likelihood estimators (MELEs) of the unknown parameters in the DRMs and/or defined through estimating equations, and establish the chi-square limiting distributions for the empirical likelihood ratio (ELR) statistics. We show that the asymptotic variance of the MELEs of the unknown parameters does not decrease if one estimating equation is dropped. Similar properties are obtained for inferences on the cumulative distribution function and quantiles of each of the populations involved. We also propose an ELR test for the validity and usefulness of the auxiliary information. Simulation studies show that correctly specified estimating equations for the auxiliary information result in more efficient estimators and shorter confidence intervals. Two real-data examples are used for illustrations.
The Gini index is a popular inequality measure with many applications in social and economic studies. This paper studies semiparametric inference on the Gini indices of two semicontinuous populations. We characterize the distribution of each semicontinuous population by a mixture of a discrete point mass at zero and a continuous skewed positive component. A semiparametric density ratio model is then employed to link the positive components of the two distributions. We propose the maximum empirical likelihood estimators of the two Gini indices and their difference, and further investigate the asymptotic properties of the proposed estimators. The asymptotic results enable us to construct confidence intervals and perform hypothesis tests for the two Gini indices and their difference. We show that the proposed estimators are more efficient than the existing fully nonparametric estimators. The proposed estimators and the asymptotic results are also applicable to cases without excessive zero values. Simulation studies show the superiority of our proposed method over existing methods. Two real-data applications are presented using the proposed methods.
A general class of time-varying regression models is considered in this paper. We estimate the regression coefficients by using local linear M-estimation. For these estimators, weak Bahadur representations are obtained and are used to construct simultaneous confidence bands. For practical implementation, we propose a bootstrap based method to circumvent the slow logarithmic convergence of the theoretical simultaneous bands. Our results substantially generalize and unify the treatments for several time-varying regression and auto-regression models. The performance for ARCH and GARCH models is studied in simulations and a few real-life applications of our study are presented through analysis of some popular financial datasets.
In this paper, we develop uniform inference methods for the conditional mode based on quantile regression. Specifically, we propose to estimate the conditional mode by minimizing the derivative of the estimated conditional quantile function defined by smoothing the linear quantile regression estimator, and develop two bootstrap methods, a novel pivotal bootstrap and the nonparametric bootstrap, for our conditional mode estimator. Building on high-dimensional Gaussian approximation techniques, we establish the validity of simultaneous confidence rectangles constructed from the two bootstrap methods for the conditional mode. We also extend the preceding analysis to the case where the dimension of the covariate vector is increasing with the sample size. Finally, we conduct simulation experiments and a real data analysis using U.S. wage data to demonstrate the finite sample performance of our inference method.
Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesis of independence between the two random vectors when only the sample size or the dimensionality diverges. Yet its asymptotic null distribution for the more realistic setting when both sample size and dimensionality diverge in the full range remains largely underdeveloped. In this paper, we fill such a gap and develop central limit theorems and associated rates of convergence for a rescaled test statistic based on the bias-corrected distance correlation in high dimensions under some mild regularity conditions and the null hypothesis. Our new theoretical results reveal an interesting phenomenon of blessing of dimensionality for high-dimensional distance correlation inference in the sense that the accuracy of normal approximation can increase with dimensionality. Moreover, we provide a general theory on the power analysis under the alternative hypothesis of dependence, and further justify the capability of the rescaled distance correlation in capturing the pure nonlinear dependency under moderately high dimensionality for a certain type of alternative hypothesis. The theoretical results and finite-sample performance of the rescaled statistic are illustrated with several simulation examples and a blockchain application.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا