No Arabic abstract
For in vivo research experiments with small sample sizes and available historical data, we propose a sequential Bayesian method for the Behrens-Fisher problem. We consider it as a model choice question with two models in competition: one for which the two expectations are equal and one for which they are different. The choice between the two models is performed through a Bayesian analysis, based on a robust choice of combined objective and subjective priors, set on the parameters space and on the models space. Three steps are necessary to evaluate the posterior probability of each model using two historical datasets similar to the one of interest. Starting from the Jeffreys prior, a posterior using a first historical dataset is deduced and allows to calibrate the Normal-Gamma informative priors for the second historical dataset analysis, in addition to a uniform prior on the model space. From this second step, a new posterior on the parameter space and the models space can be used as the objective informative prior for the last Bayesian analysis. Bayesian and frequentist methods have been compared on simulated and real data. In accordance with FDA recommendations, control of type I and type II error rates has been evaluated. The proposed method controls them even if the historical experiments are not completely similar to the one of interest.
This paper is concerned with the problem of comparing the population means of two groups of independent observations. An approximate randomization test procedure based on the test statistic of Chen & Qin (2010) is proposed. The asymptotic behavior of the test statistic as well as the randomized statistic is studied under weak conditions. In our theoretical framework, observations are not assumed to be identically distributed even within groups. No condition on the eigenstructure of the covariance is imposed. And the sample sizes of two groups are allowed to be unbalanced. Under general conditions, all possible asymptotic distributions of the test statistic are obtained. We derive the asymptotic level and local power of the proposed test procedure. Our theoretical results show that the proposed test procedure can adapt to all possible asymptotic distributions of the test statistic and always has correct test level asymptotically. Also, the proposed test procedure has good power behavior. Our numerical experiments show that the proposed test procedure has favorable performance compared with several altervative test procedures.
The use of entropy related concepts goes from physics, such as in statistical mechanics, to evolutionary biology. The Shannon entropy is a measure used to quantify the amount of information in a system, and its estimation is usually made under the frequentist approach. In the present paper, we introduce an fully objective Bayesian analysis to obtain this measures posterior distribution. Notably, we consider the Gamma distribution, which describes many natural phenomena in physics, engineering, and biology. We reparametrize the model in terms of entropy, and different objective priors are derived, such as Jeffreys prior, reference prior, and matching priors. Since the obtained priors are improper, we prove that the obtained posterior distributions are proper and their respective posterior means are finite. An intensive simulation study is conducted to select the prior that returns better results in terms of bias, mean square error, and coverage probabilities. The proposed approach is illustrated in two datasets, where the first one is related to the Achaemenid dynasty reign period, and the second data describes the time to failure of an electronic component in the sugarcane harvest machine.
In this paper we provide a provably convergent algorithm for the multivariate Gaussian Maximum Likelihood version of the Behrens--Fisher Problem. Our work builds upon a formulation of the log-likelihood function proposed by Buot and Richards citeBR. Instead of focusing on the first order optimality conditions, the algorithm aims directly for the maximization of the log-likelihood function itself to achieve a global solution. Convergence proof and complexity estimates are provided for the algorithm. Computational experiments illustrate the applicability of such methods to high-dimensional data. We also discuss how to extend the proposed methodology to a broader class of problems. We establish a systematic algebraic relation between the Wald, Likelihood Ratio and Lagrangian Multiplier Test ($Wgeq mathit{LR}geq mathit{LM}$) in the context of the Behrens--Fisher Problem. Moreover, we use our algorithm to computationally investigate the finite-sample size and power of the Wald, Likelihood Ratio and Lagrange Multiplier Tests, which previously were only available through asymptotic results. The methods developed here are applicable to much higher dimensional settings than the ones available in the literature. This allows us to better capture the role of high dimensionality on the actual size and power of the tests for finite samples.
The main goal of this paper is to study the parameter estimation problem, using the Bayesian methodology, for the drift coefficient of some linear (parabolic) SPDEs driven by a multiplicative noise of special structure. We take the spectral approach by assuming that one path of the first $N$ Fourier modes of the solution is continuously observed over a finite time interval. First, we show that the model is regular and fits into classical local asymptotic normality framework, and thus the MLE and the Bayesian estimators are weakly consistent, asymptotically normal, efficient, and asymptotically equivalent in the class of loss functions with polynomial growth. Secondly, and mainly, we prove a Bernstein-Von Mises type result, that strengthens the existing results in the literature, and that also allows to investigate the Bayesian type estimators with respect to a larger class of priors and loss functions than that covered by classical asymptotic theory. In particular, we prove strong consistency and asymptotic normality of Bayesian estimators in the class of loss functions of at most exponential growth. Finally, we present some numerical examples that illustrate the obtained theoretical results.
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov--Smirnov-type goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford--Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton--Watson related processes.