An efficient dual sampling algorithm with Hamming distance filtration


Abstract in English

Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms. We present here the Hamming distance filtered, dual partition function, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is $O(h^2n)$, where $h,n$ are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by $O(n^2)$. We then present two applications, the first being in the context of the evolution of natural sequence-structure pairs of microRNAs and the second constructing neutral paths. The former studies the inverse fold rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve towards higher levels of robustness, i.e.,~increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler in order to construct short paths connecting them, consisting of sequences all contained in the neutral network.

Download