No Arabic abstract
In this paper, we draw attention to a problem that is often overlooked or ignored by companies practicing hypothesis testing (A/B testing) in online environments. We show that conducting experiments on limited inventory that is shared between variants in the experiment can lead to high false positive rates since the core assumption of independence between the groups is violated. We provide a detailed analysis of the problem in a simplified setting whose parameters are informed by realistic scenarios. The setting we consider is a $2$-dimensional random walk in a semi-infinite strip. It is rich enough to take a finite inventory into account, but is at the same time simple enough to allow for a closed form of the false-positive probability. We prove that high false-positive rates can occur, and develop tools that are suitable to help design adequate tests in follow-up work. Our results also show that high false-negative rates may occur. The proofs rely on a functional limit theorem for the $2$-dimensional random walk in a semi-infinite strip.
We develop a unified approach to hypothesis testing for various types of widely used functional linear models, such as scalar-on-function, function-on-function and function-on-scalar models. In addition, the proposed test applies to models of mixed types, such as models with both functional and scalar predictors. In contrast with most existing methods that rest on the large-sample distributions of test statistics, the proposed method leverages the technique of bootstrapping max statistics and exploits the variance decay property that is an inherent feature of functional data, to improve the empirical power of tests especially when the sample size is limited and the signal is relatively weak. Theoretical guarantees on the validity and consistency of the proposed test are provided uniformly for a class of test statistics.
Skepticism of the building block hypothesis (BBH) has previously been expressed on account of the weak theoretical foundations of this hypothesis and the anomalies in the empirical record of the simple genetic algorithm. In this paper we hone in on a more fundamental cause for skepticism--the extraordinary strength of some of the assumptions that undergird the BBH. Specifically, we focus on assumptions made about the distribution of fitness over the genome set, and argue that these assumptions are unacceptably strong. As most of these assumptions have been embraced by the designers of so-called competent genetic algorithms, our critique is relevant to an appraisal of such algorithms as well.
Consider a $Ntimes n$ random matrix $Y_n=(Y_{ij}^{n})$ where the entries are given by $Y_{ij}^{n}=frac{sigma(i/N,j/n)}{sqrt{n}} X_{ij}^{n}$, the $X_{ij}^{n}$ being centered i.i.d. and $sigma:[0,1]^2 to (0,infty)$ being a continuous function called a variance profile. Consider now a deterministic $Ntimes n$ matrix $Lambda_n=(Lambda_{ij}^{n})$ whose non diagonal elements are zero. Denote by $Sigma_n$ the non-centered matrix $Y_n + Lambda_n$. Then under the assumption that $lim_{nto infty} frac Nn =c>0$ and $$ frac{1}{N} sum_{i=1}^{N} delta_{(frac{i}{N}, (Lambda_{ii}^n)^2)} xrightarrow[nto infty]{} H(dx,dlambda), $$ where $H$ is a probability measure, it is proven that the empirical distribution of the eigenvalues of $ Sigma_n Sigma_n^T$ converges almost surely in distribution to a non random probability measure. This measure is characterized in terms of its Stieltjes transform, which is obtained with the help of an auxiliary system of equations. This kind of results is of interest in the field of wireless communication.
We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the counting knockoffs procedure analyzed in Weinstein et al. (2017). Contrary to procedures that start with a $p$-value for each hypothesis, our method analyzes the entire data set to adaptively estimate an optimal $p$-value transform based on an empirical Bayes model. Despite the extra adaptivity, our method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor. An extension, the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification.
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov--Smirnov-type goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford--Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton--Watson related processes.