No Arabic abstract
We revisit staircases for words and prove several exact as well as asymptotic results for longest left-most staircase subsequences and subwords and staircase separation number, the latter being defined as the number of consecutive maximal staircase subwords packed in a word. We study asymptotic properties of the sequence $h_{r,k}(n),$ the number of $n$-array words with $r$ separations over alphabet $[k]$ and show that for any $rgeq 0,$ the growth sequence $big(h_{r,k}(n)big)^{1/n}$ converges to a characterized limit, independent of $r.$ In addition, we study the asymptotic behavior of the random variable $mathcal{S}_k(n),$ the number of staircase separations in a random word in $[k]^n$ and obtain several limit theorems for the distribution of $mathcal{S}_k(n),$ including a law of large numbers, a central limit theorem, and the exact growth rate of the entropy of $mathcal{S}_k(n).$ Finally, we obtain similar results, including growth limits, for longest $L$-staircase subwords and subsequences.
Let $W^{(n)}$ be the $n$-letter word obtained by repeating a fixed word $W$, and let $R_n$ be a random $n$-letter word over the same alphabet. We show several results about the length of the longest common subsequence (LCS) between $W^{(n)}$ and $R_n$; in particular, we show that its expectation is $gamma_W n-O(sqrt{n})$ for an efficiently-computable constant $gamma_W$. This is done by relating the problem to a new interacting particle system, which we dub frog dynamics. In this system, the particles (`frogs) hop over one another in the order given by their labels. Stripped of the labeling, the frog dynamics reduces to a variant of the PushTASEP. In the special case when all symbols of $W$ are distinct, we obtain an explicit formula for the constant $gamma_W$ and a closed-form expression for the stationary distribution of the associated frog dynamics. In addition, we propose new conjectures about the asymptotic of the LCS of a pair of random words. These conjectures are informed by computer experiments using a new heuristic algorithm to compute the LCS. Through our computations, we found periodic words that are more random-like than a random word, as measured by the LCS.
When considering binary strings, its natural to wonder how many distinct subsequences might exist in a given string. Given that there is an existing algorithm which provides a straightforward way to compute the number of distinct subsequences in a fixed string, we might next be interested in the expected number of distinct subsequences in random strings. This expected value is already known for random binary strings where each letter in the string is, independently, equally likely to be a 1 or a 0. We generalize this result to random strings where the letter 1 appears independently with probability $alpha in [0,1]$. Also, we make some progress in the case of random strings from an arbitrary alphabet as well as when the string is generated by a two-state Markov chain.
We determine the average number of distinct subsequences in a random binary string, and derive an estimate for the average number of distinct subsequences of a particular length.
We show that, in an alphabet of $n$ symbols, the number of words of length $n$ whose number of different symbols is away from $(1-1/e)n$, which is the value expected by the Poisson distribution, has exponential decay in $n$. We use Laplaces method for sums and known bounds of Stirling numbers of the second kind. We express our result in terms of inequalities.
We consider the expected length of the longest common subsequence between two random words of lengths $n$ and $(1-varepsilon)kn$ over $k$-symbol alphabet. It is well-known that this quantity is asymptotic to $gamma_{k,varepsilon} n$ for some constant $gamma_{k,varepsilon}$. We show that $gamma_{k,varepsilon}$ is of the order $1-cvarepsilon^2$ uniformly in $k$ and $varepsilon$. In addition, for large $k$, we give evidence that $gamma_{k,varepsilon}$ approaches $1-tfrac{1}{4}varepsilon^2$, and prove a matching lower bound.