We consider DNA codes based on the nearest-neighbor (stem) similarity model which adequately reflects the hybridization potential of two DNA sequences. Our aim is to present a survey of bounds on the rate of DNA codes with respect to a thermodynamically motivated similarity measure called an additive stem similarity. These results yield a method to analyze and compare known samples of the nearest neighbor thermodynamic weights associated to stacked pairs that occurred in DNA secondary structures.
DNA code design aims to generate a set of DNA sequences (codewords) with minimum likelihood of undesired hybridizations among sequences and their reverse-complement (RC) pairs (cross-hybridization). Inspired by the distinct hybridization affinities (
or stabilities) of perfect double helix constructed by individual single-stranded DNA (ssDNA) and its RC pair, we propose a novel similarity significance (SS) model to measure the similarity between DNA sequences. Particularly, instead of directly measuring the similarity of two sequences by any metric/approach, the proposed SS works in a way to evaluate how more likely will the undesirable hybridizations occur over the desirable hybridizations in the presence of the two measured sequences and their RC pairs. With this SS model, we construct thermodynamically stable DNA codes subject to several combinatorial constraints using a sorting-based algorithm. The proposed scheme results in DNA codes with larger code sizes and wider free energy gaps (hence better cross-hybridization performance) compared to the existing methods.
The well known Plotkin construction is, in the current paper, generalized and used to yield new families of Z2Z4-additive codes, whose length, dimension as well as minimum distance are studied. These new constructions enable us to obtain families of
Z2Z4-additive codes such that, under the Gray map, the corresponding binary codes have the same parameters and properties as the usual binary linear Reed-Muller codes. Moreover, the first family is the usual binary linear Reed-Muller family.
In this paper, two different Gray-like maps from $Z_p^alphatimes Z_{p^k}^beta$, where $p$ is prime, to $Z_p^n$, $n={alpha+beta p^{k-1}}$, denoted by $phi$ and $Phi$, respectively, are presented. We have determined the connection between the weight en
umerators among the image codes under these two mappings. We show that if $C$ is a $Z_p Z_{p^k}$-additive code, and $C^bot$ is its dual, then the weight enumerators of the image $p$-ary codes $phi(C)$ and $Phi(C^bot)$ are formally dual. This is a partial generalization of [On $Z_{2^k}$-dual binary codes, arXiv:math/0509325], and the result is generalized to odd characteristic $p$ and mixed alphabet. Additionally, a construction of $1$-perfect additive codes in the mixed $Z_p Z_{p^2} ... Z_{p^k}$ alphabet is given.
The Doob graph $D(m,n)$ is the Cartesian product of $m>0$ copies of the Shrikhande graph and $n$ copies of the complete graph of order $4$. Naturally, $D(m,n)$ can be represented as a Cayley graph on the additive group $(Z_4^2)^m times (Z_2^2)^{n} ti
mes Z_4^{n}$, where $n+n=n$. A set of vertices of $D(m,n)$ is called an additive code if it forms a subgroup of this group. We construct a $3$-parameter class of additive perfect codes in Doob graphs and show that the known necessary conditions of the existence of additive $1$-perfect codes in $D(m,n+n)$ are sufficient. Additionally, two quasi-cyclic additive $1$-perfect codes are constructed in $D(155,0+31)$ and $D(2667,0+127)$.
Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatena
ted coding scheme with an outer low-density parity-check code and either an inner convolutional code or a block code. We propose two new decoding algorithms for inference from multiple received sequences, both combining the inner code and channel to a joint hidden Markov model to infer symbolwise a posteriori probabilities (APPs). The first decoder computes the exact APPs by jointly decoding the received sequences, whereas the second decoder approximates the APPs by combining the results of separately decoded received sequences. Using the proposed algorithms, we evaluate the performance of decoding multiple received sequences by means of achievable information rates and Monte-Carlo simulations. We show significant performance gains compared to a single received sequence.