No Arabic abstract
Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analysing these data. Although these models have been parameterised and fitted using different approaches, they have all been designed to model the pattern with which individuals enter and exit the population and to estimate the population size. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters are specific to sampling occasions, or by employing parametric curves. Instead, we propose a novel Bayesian nonparametric framework for modelling entry and exit patterns based on the Polya Tree (PT) prior for densities. Our Bayesian non-parametric approach avoids overfitting when inferring entry and exit patterns while simultaneously allowing more flexibility than is possible using parametric curves. We apply our new framework to capture-recapture, count and ring-recovery data and we introduce the replicated PT prior for defining classes of models for these data. Additionally, we define the Hierarchical Logistic PT prior for jointly modelling related data and we consider the Optional PT prior for modelling long time series of data. We demonstrate our new approach using five different case studies on birds, amphibians and insects.
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithms favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
There has been increased interest in using prior information in statistical analyses. For example, in rare diseases, it can be difficult to establish treatment efficacy based solely on data from a prospective study due to low sample sizes. To overcome this issue, an informative prior for the treatment effect may be elicited. We develop a novel extension of the conjugate prior of Chen and Ibrahim (2003) that enables practitioners to elicit a prior prediction for the mean response for generalized linear models, treating the prediction as random. We refer to the hierarchical prior as the hierarchical prediction prior. For i.i.d. settings and the normal linear model, we derive cases for which the hyperprior is a conjugate prior. We also develop an extension of the HPP in situations where summary statistics from a previous study are available, drawing comparisons with the power prior. The HPP allows for discounting based on the quality of individual level predictions, having the potential to provide efficiency gains (e.g., lower MSE) where predictions are incompatible with the data. An efficient Markov chain Monte Carlo algorithm is developed. Applications illustrate that inferences under the HPP are more robust to prior-data conflict compared to selected non-hierarchical priors.
Characterizing the wind speed distribution properly is essential for the satisfactory production of potential energy in wind farms, being the mixture models usually employed in the description of such data. However, some mixture models commonly have the undesirable property of non-identifiability. In this work, we present an alternative distribution which is able to fit the wind speed data adequately. The new model, called Normal-Weibull-Weibull, is identifiable and its cumulative distribution function is written as a composition of two baseline functions. We discuss structural properties of the class that generates the proposed model, such as the linear representation of the probability density function, moments and moment generating function. We perform a Monte Carlo simulation study to investigate the behavior of the maximum likelihood estimates of the parameters. Finally, we present applications of the new distribution for modelling wind speed data measured in five different cities of the Northeastern Region of Brazil.
A small n, sequential, multiple assignment, randomized trial (snSMART) is a small sample, two-stage design where participants receive up to two treatments sequentially, but the second treatment depends on response to the first treatment. The treatment effect of interest in an snSMART is the first-stage response rate, but outcomes from both stages can be used to obtain more information from a small sample. A novel way to incorporate the outcomes from both stages applies power prior models, in which first stage outcomes from an snSMART are regarded as the primary data and second stage outcomes are regarded as supplemental. We apply existing power prior models to snSMART data, and we also develop new extensions of power prior models. All methods are compared to each other and to the Bayesian joint stage model (BJSM) via simulation studies. By comparing the biases and the efficiency of the response rate estimates among all proposed power prior methods, we suggest application of Fishers exact test or the Bhattacharyyas overlap measure to an snSMART to estimate the treatment effect in an snSMART, which both have performance mostly as good or better than the BJSM. We describe the situations where each of these suggested approaches is preferred.
Environmental health studies are increasingly measuring multiple pollutants to characterize the joint health effects attributable to exposure mixtures. However, the underlying dose-response relationship between toxicants and health outcomes of interest may be highly nonlinear, with possible nonlinear interaction effects. Existing penalized regression methods that account for exposure interactions either cannot accommodate nonlinear interactions while maintaining strong heredity or are computationally unstable in applications with limited sample size. In this paper, we propose a general shrinkage and selection framework to identify noteworthy nonlinear main and interaction effects among a set of exposures. We design hierarchical integrative group LASSO (HiGLASSO) to (a) impose strong heredity constraints on two-way interaction effects (hierarchical), (b) incorporate adaptive weights without necessitating initial coefficient estimates (integrative), and (c) induce sparsity for variable selection while respecting group structure (group LASSO). We prove sparsistency of the proposed method and apply HiGLASSO to an environmental toxicants dataset from the LIFECODES birth cohort, where the investigators are interested in understanding the joint effects of 21 urinary toxicant biomarkers on urinary 8-isoprostane, a measure of oxidative stress. An implementation of HiGLASSO is available in the higlasso R package, accessible through the Comprehensive R Archive Network.