No Arabic abstract
One possible explanation for the substantial organismal differences between humans and chimpanzees is that there have been changes in gene regulation. Given what is known about transcription factor binding sites, this motivates the following probability question: given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual? Stone and Wray [Mol. Biol. Evol. 18 (2001) 1764--1770] computed 5,950 years as the answer for six letter words. Here, we will show that for words of length 6, the average waiting time is 100,000 years, while for words of length 8, the waiting time has mean 375,000 years when there is a 7 out of 8 letter match in the population consensus sequence (an event of probability roughly 5/16) and has mean 650 million years when there is not. Fortunately, in biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that a 7 out of 8 letter match is good enough, the mean reduces to about 60,000 years.
We consider a model of a population of fixed size N in which each individual gets replaced at rate one and each individual experiences a mutation at rate mu. We calculate the asymptotic distribution of the time that it takes before there is an individual in the population with m mutations. Several different behaviors are possible, depending on how mu changes with N. These results have applications to the problem of determining the waiting time for regulatory sequences to appear and to models of cancer development.
We consider self-averaging sequences in which each term is a weighted average over previous terms. For several sequences of this kind it is known that they do not converge to a limit. These sequences share the property that $n$th term is mainly based on terms around a fixed fraction of $n$. We give a probabilistic interpretation to such sequences and give weak conditions under which it is natural to expect non-convergence. Our methods are illustrated by application to the group Russian roulette problem.
Let X^{(k)}(t) = (X_1(t), ..., X_k(t)) denote a k-vector of i.i.d. random variables, each taking the values 1 or 0 with respective probabilities p and 1-p. As a process indexed by non-negative t, $X^{(k)}(t)$ is constructed--following Benjamini, Haggstrom, Peres, and Steif (2003)--so that it is strong Markov with invariant measure ((1-p)delta_0+pdelta_1)^k. We derive sharp estimates for the probability that ``X_1(t)+...+X_k(t)=k-ell for some t in F, where F subset [0,1] is nonrandom and compact. We do this in two very different settings: (i) Where ell is a constant; and (ii) Where ell=k/2, k is even, and p=q=1/2. We prove that the probability is described by the Kolmogorov capacitance of F for case (i) and Howroyds 1/2-dimensional box-dimension profiles for case (ii). We also present sample-path consequences, and a connection to capacities that answers a question of Benjamini et. al. (2003)
We investigate various geometric and functional inequalities for the class of log-concave probability sequences. We prove dilation inequalities for log-concave probability measures on the integers. A functional analog of this geometric inequality is derived, giving large and small deviation inequalities from a median, in terms of a modulus of regularity. Our methods are of independent interest, we find that log-affine sequences are the extreme points of the set of log-concave sequences belonging to a half-space slice of the simplex. This amounts to a discrete analog of the localization lemma of Lovasz and Simonovits. Further applications of this lemma are used to produce a discrete version of the Prekopa-Leindler inequality, large deviation inequalities for log-concave measures about their mean, and provide insight on the stability of generalized log-concavity under convolution.
We empirically study sorting in the evolving data model. In this model, a sorting algorithm maintains an approximation to the sorted order of a list of data items while simultaneously, with each comparison made by the algorithm, an adversary randomly swaps the order of adjacent items in the true sorted order. Previous work studies only t