There have been controversies among statisticians on (i) what to model and (ii) how to make inferences from models with unobservables. One such controversy concerns the difference between estimation methods for the marginal means not necessarily having a probabilistic basis and statistical models having unobservables with a probabilistic basis. Another concerns likelihood-based inference for statistical models with unobservables. This needs an extended-likelihood framework, and we show how one such extension, hierarchical likelihood, allows this to be done. Modeling of unobservables leads to rich classes of new probabilistic models from which likelihood-type inferences can be made naturally with hierarchical likelihood.
We address the issue of performing testing inference in generalized linear models when the sample size is small. This class of models provides a straightforward way of modeling normal and non-normal data and has been widely used in several practical situations. The likelihood ratio, Wald and score statistics, and the recently proposed gradient statistic provide the basis for testing inference on the parameters in these models. We focus on the small-sample case, where the reference chi-squared distribution gives a poor approximation to the true null distribution of these test statistics. We derive a general Bartlett-type correction factor in matrix notation for the gradient test which reduces the size distortion of the test, and numerically compare the proposed test with the usual likelihood ratio, Wald, score and gradient tests, and with the Bartlett-corrected likelihood ratio and score tests. Our simulation results suggest that the corrected test we propose can be an interesting alternative to the other tests since it leads to very accurate inference even for very small samples. We also present an empirical application for illustrative purposes.
A maximum likelihood methodology for the parameters of models with an intractable likelihood is introduced. We produce a likelihood-free version of the stochastic approximation expectation-maximization (SAEM) algorithm to maximize the likelihood function of model parameters. While SAEM is best suited for models having a tractable complete likelihood function, its application to moderately complex models is a difficult or even impossible task. We show how to construct a likelihood-free version of SAEM by using the synthetic likelihood paradigm. Our method is completely plug-and-play, requires almost no tuning and can be applied to both static and dynamic models. Four simulation studies illustrate the method, including a stochastic differential equation model, a stochastic Lotka-Volterra model and data from $g$-and-$k$ distributions. MATLAB code is available as supplementary material.
Nonparametric maximum likelihood (NPML) for mixture models is a technique for estimating mixing distributions that has a long and rich history in statistics going back to the 1950s, and is closely related to empirical Bayes methods. Historically, NPML-based methods have been considered to be relatively impractical because of computational and theoretical obstacles. However, recent work focusing on approximate NPML methods suggests that these methods may have great promise for a variety of modern applications. Building on this recent work, a class of flexible, scalable, and easy to implement approximate NPML methods is studied for problems with multivariate mixing distributions. Concrete guidance on implementing these methods is provided, with theoretical and empirical support; topics covered include identifying the support set of the mixing distribution, and comparing algorithms (across a variety of metrics) for solving the simple convex optimization problem at the core of the approximate NPML problem. Additionally, three diverse real data applications are studied to illustrate the methods performance: (i) A baseball data analysis (a classical example for empirical Bayes methods), (ii) high-dimensional microarray classification, and (iii) online prediction of blood-glucose density for diabetes patients. Among other things, the empirical results demonstrate the relative effectiveness of using multivariate (as opposed to univariate) mixing distributions for NPML-based approaches.
Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.