No Arabic abstract
We address the issue of performing testing inference in generalized linear models when the sample size is small. This class of models provides a straightforward way of modeling normal and non-normal data and has been widely used in several practical situations. The likelihood ratio, Wald and score statistics, and the recently proposed gradient statistic provide the basis for testing inference on the parameters in these models. We focus on the small-sample case, where the reference chi-squared distribution gives a poor approximation to the true null distribution of these test statistics. We derive a general Bartlett-type correction factor in matrix notation for the gradient test which reduces the size distortion of the test, and numerically compare the proposed test with the usual likelihood ratio, Wald, score and gradient tests, and with the Bartlett-corrected likelihood ratio and score tests. Our simulation results suggest that the corrected test we propose can be an interesting alternative to the other tests since it leads to very accurate inference even for very small samples. We also present an empirical application for illustrative purposes.
The Birnbaum-Saunders regression model is commonly used in reliability studies. We address the issue of performing inference in this class of models when the number of observations is small. We show that the likelihood ratio test tends to be liberal when the sample size is small, and we obtain a correction factor which reduces the size distortion of the test. The correction makes the error rate of he test vanish faster as the sample size increases. The numerical results show that the modified test is more reliable in finite samples than the usual likelihood ratio test. We also present an empirical application.
Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.
There have been controversies among statisticians on (i) what to model and (ii) how to make inferences from models with unobservables. One such controversy concerns the difference between estimation methods for the marginal means not necessarily having a probabilistic basis and statistical models having unobservables with a probabilistic basis. Another concerns likelihood-based inference for statistical models with unobservables. This needs an extended-likelihood framework, and we show how one such extension, hierarchical likelihood, allows this to be done. Modeling of unobservables leads to rich classes of new probabilistic models from which likelihood-type inferences can be made naturally with hierarchical likelihood.
A maximum likelihood methodology for the parameters of models with an intractable likelihood is introduced. We produce a likelihood-free version of the stochastic approximation expectation-maximization (SAEM) algorithm to maximize the likelihood function of model parameters. While SAEM is best suited for models having a tractable complete likelihood function, its application to moderately complex models is a difficult or even impossible task. We show how to construct a likelihood-free version of SAEM by using the synthetic likelihood paradigm. Our method is completely plug-and-play, requires almost no tuning and can be applied to both static and dynamic models. Four simulation studies illustrate the method, including a stochastic differential equation model, a stochastic Lotka-Volterra model and data from $g$-and-$k$ distributions. MATLAB code is available as supplementary material.
Modern data sets in various domains often include units that were sampled non-randomly from the population and have a latent correlation structure. Here we investigate a common form of this setting, where every unit is associated with a latent variable, all latent variables are correlated, and the probability of sampling a unit depends on its response. Such settings often arise in case-control studies, where the sampled units are correlated due to spatial proximity, family relations, or other sources of relatedness. Maximum likelihood estimation in such settings is challenging from both a computational and statistical perspective, necessitating approximations that take the sampling scheme into account. We propose a family of approximate likelihood approaches which combine composite likelihood and expectation propagation. We demonstrate the efficacy of our solutions via extensive simulations. We utilize them to investigate the genetic architecture of several complex disorders collected in case-control genetic association studies, where hundreds of thousands of genetic variants are measured for every individual, and the underlying disease liabilities of individuals are correlated due to genetic similarity. Our work is the first to provide a tractable likelihood-based solution for case-control data with complex dependency structures.