No Arabic abstract
The random coefficients model $Y_i={beta_0}_i+{beta_1}_i {X_1}_i+{beta_2}_i {X_2}_i+ldots+{beta_d}_i {X_d}_i$, with $mathbf{X}_i$, $Y_i$, $mathbf{beta}_i$ i.i.d, and $mathbf{beta}_i$ independent of $X_i$ is often used to capture unobserved heterogeneity in a population. We propose a quasi-maximum likelihood method to estimate the joint density distribution of the random coefficient model. This method implicitly involves the inversion of the Radon transformation in order to reconstruct the joint distribution, and hence is an inverse problem. Nonparametric estimation for the joint density of $mathbf{beta}_i=({beta_0}_i,ldots, {beta_d}_i)$ based on kernel methods or Fourier inversion have been proposed in recent years. Most of these methods assume a heavy tailed design density $f_mathbf{X}$. To add stability to the solution, we apply regularization methods. We analyze the convergence of the method without assuming heavy tails for $f_mathbf{X}$ and illustrate performance by applying the method on simulated and real data. To add stability to the solution, we apply a Tikhonov-type regularization method.
We consider the problem of estimating parameters of stochastic differential equations (SDEs) with discrete-time observations that are either completely or partially observed. The transition density between two observations is generally unknown. We propose an importance sampling approach with an auxiliary parameter when the transition density is unknown. We embed the auxiliary importance sampler in a penalized maximum likelihood framework which produces more accurate and computationally efficient parameter estimates. Simulation studies in three different models illustrate promising improvements of the new penalized simulated maximum likelihood method. The new procedure is designed for the challenging case when some state variables are unobserved and moreover, observed states are sparse over time, which commonly arises in ecological studies. We apply this new approach to two epidemics of chronic wasting disease in mule deer.
We derive Laplace-approximated maximum likelihood estimators (GLAMLEs) of parameters in our Graph Generalized Linear Latent Variable Models. Then, we study the statistical properties of GLAMLEs when the number of nodes $n_V$ and the observed times of a graph denoted by $K$ diverge to infinity. Finally, we display the estimation results in a Monte Carlo simulation considering different numbers of latent variables. Besides, we make a comparison between Laplace and variational approximations for inference of our model.
A maximum likelihood methodology for a general class of models is presented, using an approximate Bayesian computation (ABC) approach. The typical target of ABC methods are models with intractable likelihoods, and we combine an ABC-MCMC sampler with so-called data cloning for maximum likelihood estimation. Accuracy of ABC methods relies on the use of a small threshold value for comparing simulations from the model and observed data. The proposed methodology shows how to use large threshold values, while the number of data-clones is increased to ease convergence towards an approximate maximum likelihood estimate. We show how to exploit the methodology to reduce the number of iterations of a standard ABC-MCMC algorithm and therefore reduce the computational effort, while obtaining reasonable point estimates. Simulation studies show the good performance of our approach on models with intractable likelihoods such as g-and-k distributions, stochastic differential equations and state-space models.
CRISPR technology has enabled large-scale cell lineage tracing for complex multicellular organisms by mutating synthetic genomic barcodes during organismal development. However, these sophisticated biological tools currently use ad-hoc and outmoded computational methods to reconstruct the cell lineage tree from the mutated barcodes. Because these methods are agnostic to the biological mechanism, they are unable to take full advantage of the datas structure. We propose a statistical model for the mutation process and develop a procedure to estimate the tree topology, branch lengths, and mutation parameters by iteratively applying penalized maximum likelihood estimation. In contrast to existing techniques, our method estimates time along each branch, rather than number of mutation events, thus providing a detailed account of tissue-type differentiation. Via simulations, we demonstrate that our method is substantially more accurate than existing approaches. Our reconstructed trees also better recapitulate known aspects of zebrafish development and reproduce similar results across fish replicates.
Statistical models with latent structure have a history going back to the 1950s and have seen widespread use in the social sciences and, more recently, in computational biology and in machine learning. Here we study the basic latent class model proposed originally by the sociologist Paul F. Lazarfeld for categorical variables, and we explain its geometric structure. We draw parallels between the statistical and geometric properties of latent class models and we illustrate geometrically the causes of many problems associated with maximum likelihood estimation and related statistical inference. In particular, we focus on issues of non-identifiability and determination of the model dimension, of maximization of the likelihood function and on the effect of symmetric data. We illustrate these phenomena with a variety of synthetic and real-life tables, of different dimension and complexity. Much of the motivation for this work stems from the 100 Swiss Francs problem, which we introduce and describe in detail.