No Arabic abstract
Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorovs Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.
The treatment effects of the same therapy observed from multiple clinical trials can often be very different. Yet the patient characteristics accounting for these differences may not be identifiable in real world practice. There needs to be an unbiased way to combine the results from multiple trials and report the overall treatment effect for the general population during the development and validation of a new therapy. The non-linear structure of the maximum partial likelihood estimates for the (log) hazard ratio defined with a Cox proportional hazard model leads to challenges in the statistical analyses for combining such clinical trials. In this paper, we formulated the expected overall treatment effects using various modeling assumptions. Thus we are proposing efficient estimates and a version of Wald test for the combined hazard ratio using only aggregate data. Interpretation of the methods are provided in the framework of robust data analyses involving misspecified models.
Background and Aims: Prediction of phenotypic traits from new genotypes under untested environmental conditions is crucial to build simulations of breeding strategies to improve target traits. Although the plant response to environmental stresses is characterized by both architectural and functional plasticity, recent attempts to integrate biological knowledge into genetics models have mainly concerned specific physiological processes or crop models without architecture, and thus may prove limited when studying genotype x environment interactions. Consequently, this paper presents a simulation study introducing genetics into a functional-structural growth model, which gives access to more fundamental traits for quantitative trait loci (QTL) detection and thus to promising tools for yield optimization. Methods: The GreenLab model was selected as a reasonable choice to link growth model parameters to QTL. Virtual genes and virtual chromosomes were defined to build a simple genetic model that drove the settings of the species-specific parameters of the model. The QTL Cartographer software was used to study QTL detection of simulated plant traits. A genetic algorithm was implemented to define the ideotype for yield maximization based on the model parameters and the associated allelic combination. Key Results and Conclusions: By keeping the environmental factors constant and using a virtual population with a large number of individuals generated by a Mendelian genetic model, results for an ideal case could be simulated. Virtual QTL detection was compared in the case of phenotypic traits - such as cob weight - and when traits were model parameters, and was found to be more accurate in the latter case. The practical interest of this approach is illustrated by calculating the parameters (and the corresponding genotype) associated with yield optimization of a GreenLab maize model. The paper discusses the potentials of GreenLab to represent environment x genotype interactions, in particular through its main state variable, the ratio of biomass supply over demand.
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although non-identifiability was proven for a semi-parametric model and an incorrect proof of identifiability was published for a general parametric model (GTR+Gamma+I). Here we prove that one of the most widely used models (GTR+Gamma) is identifiable for generic parameters, and for all parameter choices in the case of 4-state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.
In this article, I investigate the use of Bayesian updating rules applied to modeling social agents in the case of continuos opinions models. Given another agent statement about the continuous value of a variable $x$, we will see that interesting dynamics emerge when an agent assigns a likelihood to that value that is a mixture of a Gaussian and a Uniform distribution. This represents the idea the other agent might have no idea about what he is talking about. The effect of updating only the first moments of the distribution will be studied. and we will see that this generates results similar to those of the Bounded Confidence models. By also updating the second moment, several different opinions always survive in the long run. However, depending on the probability of error and initial uncertainty, those opinions might be clustered around a central value.
We consider inference about the history of a sample of DNA sequences, conditional upon the haplotype counts and the number of segregating sites observed at the present time. After deriving some theoretical results in the coalescent setting, we implement rejection sampling and importance sampling schemes to perform the inference. The importance sampling scheme addresses an extension of the Ewens Sampling Formula for a configuration of haplotypes and the number of segregating sites in the sample. The implementations include both constant and variable population size models. The methods are illustrated by two human Y chromosome data sets.