No Arabic abstract
Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments based on a known specified mechanism (e.g., implementing a community-level intervention policy that target stochastic changes in the behavior of a target population of communities). The tmleCommunity package is recently developed to implement targeted minimum loss-based estimation (TMLE) of the effect of community-level intervention(s) at a single time point on an individual-based outcome of interest, including the average causal effect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available. The package supports multivariate arbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous outcome. Besides, it allows user-specified data-adaptive machine learning algorithms through SuperLearner, sl3 and h2oEnsemble packages. The usage of the tmleCommunity package, along with a few examples, will be described in this paper.
Unlike the commonly used parametric regression models such as mixed models, that can easily violate the required statistical assumptions and result in invalid statistical inference, target maximum likelihood estimation allows more realistic data-generative models and provides double-robust, semi-parametric and efficient estimators. Target maximum likelihood estimators (TMLEs) for the causal effect of a community-level static exposure were previously proposed by Balzer et al. In this manuscript, we build on this work and present identifiability results and develop two semi-parametric efficient TMLEs for the estimation of the causal effect of the single time-point community-level stochastic intervention whose assignment mechanism can depend on measured and unmeasured environmental factors and its individual-level covariates. The first community-level TMLE is developed under a general hierarchical non-parametric structural equation model, which can incorporate pooled individual-level regressions for estimating the outcome mechanism. The second individual-level TMLE is developed under a restricted hierarchical model in which the additional assumption of no covariate interference within communities holds. The proposed TMLEs have several crucial advantages. First, both TMLEs can make use of individual level data in the hierarchical setting, and potentially reduce finite sample bias and improve estimator efficiency. Second, the stochastic intervention framework provides a natural way for defining and estimating casual effects where the exposure variables are continuous or discrete with multiple levels, or even cannot be directly intervened on. Also, the positivity assumption needed for our proposed causal parameters can be weaker than the version of positivity required for other casual parameters.
A maximum likelihood methodology for a general class of models is presented, using an approximate Bayesian computation (ABC) approach. The typical target of ABC methods are models with intractable likelihoods, and we combine an ABC-MCMC sampler with so-called data cloning for maximum likelihood estimation. Accuracy of ABC methods relies on the use of a small threshold value for comparing simulations from the model and observed data. The proposed methodology shows how to use large threshold values, while the number of data-clones is increased to ease convergence towards an approximate maximum likelihood estimate. We show how to exploit the methodology to reduce the number of iterations of a standard ABC-MCMC algorithm and therefore reduce the computational effort, while obtaining reasonable point estimates. Simulation studies show the good performance of our approach on models with intractable likelihoods such as g-and-k distributions, stochastic differential equations and state-space models.
Modeling the diameter distribution of trees in forest stands is a common forestry task that supports key biologically and economically relevant management decisions. The choice of model used to represent the diameter distribution and how to estimate its parameters has received much attention in the forestry literature; however, accessible software that facilitates comprehensive comparison of the myriad modeling approaches is not available. To this end, we developed an R package called ForestFit that simplifies estimation of common probability distributions used to model tree diameter distributions, including the two- and three-parameter Weibull distributions, Johnsons SB distribution, Birnbaum-Saunders distribution, and finite mixture distributions. Frequentist and Bayesian techniques are provided for individual tree diameter data, as well as grouped data. Additional functionality facilitates fitting growth curves to height-diameter data. The package also provides a set of functions for computing probability distributions and simulating random realizations from common finite mixture models.
This paper introduces the R package slm which stands for Stationary Linear Models. The package contains a set of statistical procedures for linear regression in the general context where the error process is strictly stationary with short memory. We work in the setting of Hannan (1973), who proved the asymptotic normality of the (normalized) least squares estimators (LSE) under very mild conditions on the error process. We propose different ways to estimate the asymptotic covariance matrix of the LSE, and then to correct the type I error rates of the usual tests on the parameters (as well as confidence intervals). The procedures are evaluated through different sets of simulations, and two examples of real datasets are studied.
Let X_1, ..., X_n be independent and identically distributed random vectors with a log-concave (Lebesgue) density f. We first prove that, with probability one, there exists a unique maximum likelihood estimator of f. The use of this estimator is attractive because, unlike kernel density estimation, the method is fully automatic, with no smoothing parameters to choose. Although the existence proof is non-constructive, we are able to reformulate the issue of computation in terms of a non-differentiable convex optimisation problem, and thus combine techniques of computational geometry with Shors r-algorithm to produce a sequence that converges to the maximum likelihood estimate. For the moderate or large sample sizes in our simulations, the maximum likelihood estimator is shown to provide an improvement in performance compared with kernel-based methods, even when we allow the use of a theoretical, optimal fixed bandwidth for the kernel estimator that would not be available in practice. We also present a real data clustering example, which shows that our methodology can be used in conjunction with the Expectation--Maximisation (EM) algorithm to fit finite mixtures of log-concave densities. An R version of the algorithm is available in the package LogConcDEAD -- Log-Concave Density Estimation in Arbitrary Dimensions.