No Arabic abstract
We consider a Markov chain that iteratively generates a sequence of random finite words in such a way that the $n^{mathrm{th}}$ word is uniformly distributed over the set of words of length $2n$ in which $n$ letters are $a$ and $n$ letters are $b$: at each step an $a$ and a $b$ are shuffled in uniformly at random among the letters of the current word. We obtain a concrete characterization of the Doob-Martin boundary of this Markov chain. Writing $N(u)$ for the number of letters $a$ (equivalently, $b$) in the finite word $u$, we show that a sequence $(u_n)_{n in mathbb{N}}$ of finite words converges to a point in the boundary if, for an arbitrary word $v$, there is convergence as $n$ tends to infinity of the probability that the selection of $N(v)$ letters $a$ and $N(v)$ letters $b$ uniformly at random from $u_n$ and maintaining their relative order results in $v$. We exhibit a bijective correspondence between the points in the boundary and ergodic random total orders on the set ${a_1, b_1, a_2, b_2, ldots }$ that have distributions which are separately invariant under finite permutations of the indices of the $a$s and those of the $b$s. We establish a further bijective correspondence between the set of such random total orders and the set of pairs $(mu, u)$ of diffuse probability measures on $[0,1]$ such that $frac{1}{2}(mu+ u)$ is Lebesgue measure: the restriction of the random total order to ${a_1, b_1, ldots, a_n, b_n}$ is obtained by taking $X_1, ldots, X_n$ (resp. $Y_1, ldots, Y_n$) i.i.d. with common distribution $mu$ (resp. $ u$), letting $(Z_1, ldots, Z_{2n})$ be ${X_1, Y_1, ldots, X_n, Y_n}$ in increasing order, and declaring that the $k^{mathrm{th}}$ smallest element in the restricted total order is $a_i$ (resp. $b_j$) if $Z_k = X_i$ (resp. $Z_k = Y_j$).
RNA motifs typically consist of short, modular patterns that include base pairs formed within and between modules. Estimating the abundance of these patterns is of fundamental importance for assessing the statistical significance of matches in genomewide searches, and for predicting whether a given function has evolved many times in different species or arose from a single common ancestor. In this manuscript, we review in an integrated and self-contained manner some basic concepts of automata theory, generating functions and transfer matrix methods that are relevant to pattern analysis in biological sequences. We formalize, in a general framework, the concept of Markov chain embedding to analyze patterns in random strings produced by a memoryless source. This conceptualization, together with the capability of automata to recognize complicated patterns, allows a systematic analysis of problems related to the occurrence and frequency of patterns in random strings. The applications we present focus on the concept of synchronization of automata, as well as automata used to search for a finite number of keywords (including sets of patterns generated according to base pairing rules) in a general text.
From the perspective of probability, the stability of growing network is studied in the present paper. Using the DMS model as an example, we establish a relation between the growing network and Markov process. Based on the concept and technique of first-passage probability in Markov theory, we provide a rigorous proof for existence of the steady-state degree distribution, mathematically re-deriving the exact formula of the distribution. The approach based on Markov chain theory is universal and performs well in a large class of growing networks.
Let $W^{(n)}$ be the $n$-letter word obtained by repeating a fixed word $W$, and let $R_n$ be a random $n$-letter word over the same alphabet. We show several results about the length of the longest common subsequence (LCS) between $W^{(n)}$ and $R_n$; in particular, we show that its expectation is $gamma_W n-O(sqrt{n})$ for an efficiently-computable constant $gamma_W$. This is done by relating the problem to a new interacting particle system, which we dub frog dynamics. In this system, the particles (`frogs) hop over one another in the order given by their labels. Stripped of the labeling, the frog dynamics reduces to a variant of the PushTASEP. In the special case when all symbols of $W$ are distinct, we obtain an explicit formula for the constant $gamma_W$ and a closed-form expression for the stationary distribution of the associated frog dynamics. In addition, we propose new conjectures about the asymptotic of the LCS of a pair of random words. These conjectures are informed by computer experiments using a new heuristic algorithm to compute the LCS. Through our computations, we found periodic words that are more random-like than a random word, as measured by the LCS.
We present a Markov chain on the $n$-dimensional hypercube ${0,1}^n$ which satisfies $t_{{rm mix}}(epsilon) = n[1 + o(1)]$. This Markov chain alternates between random and deterministic moves and we prove that the chain has cut-off with a window of size at most $O(n^{0.5+delta})$ where $delta>0$. The deterministic moves correspond to a linear shift register.
We study a simple Markov chain, the switch chain, on the set of all perfect matchings in a bipartite graph. This Markov chain was proposed by Diaconis, Graham and Holmes as a possible approach to a sampling problem arising in Statistics. We ask: for which classes of graphs is the Markov chain ergodic and for which is it rapidly mixing? We provide a precise answer to the ergodicity question and close bounds on the mixing question. We show for the first time that the mixing time of the switch chain is polynomial in the case of monotone graphs, a class that includes examples of interest in the statistical setting.