No Arabic abstract
Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.
Non-coding RNA molecules fold into precise base pairing patterns to carry out critical roles in genetic regulation and protein synthesis. We show here that coupling systematic mutagenesis with high-throughput SHAPE chemical mapping enables accurate base pair inference of domains from ribosomal RNA, ribozymes, and riboswitches. For a six-RNA benchmark that challenged prior chemical/computational methods, this mutate-and-map strategy gives secondary structures in agreement with crystallographic data (2 % error rates), including a blind test on a double-glycine riboswitch. Through modeling of partially ordered RNA states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the mutate-and-map data report on tertiary contacts within non-coding RNAs; coupled with the Rosetta/FARFAR algorithm, these data give nucleotide-resolution three-dimensional models (5.7 {AA} helix RMSD) of an adenine riboswitch. These results highlight the promise of a two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behavior.
It is generally accepted that, when moving in groups, animals process information to coordinate their motion. Recent studies have begun to apply rigorous methods based on Information Theory to quantify such distributed computation. Following this perspective, we use transfer entropy to quantify dynamic information flows locally in space and time across a school of fish during directional changes around a circular tank, i.e. U-turns. This analysis reveals peaks in information flows during collective U-turns and identifies two different flows: an informative flow (positive transfer entropy) based on fish that have already turned about fish that are turning, and a misinformative flow (negative transfer entropy) based on fish that have not turned yet about fish that are turning. We also reveal that the information flows are related to relative position and alignment between fish, and identify spatial patterns of information and misinformation cascades. This study offers several methodological contributions and we expect further application of these methodologies to reveal intricacies of self-organisation in other animal groups and active matter in general.
Power spectra of human DNA base C+G frequency distribution in all available contiguous sections exhibit the universal inverse power law form of the statistical normal distribution for the 24 chromosomes. Inverse power law form for power spectra of space-time fluctuations is generic to dynamical systems in nature and indicate long-range space-time correlations. A recently developed general systems theory predicts the observed non-local connections as intrinsic to quantumlike chaos governing space-time fluctuations of dynamical systems. The model predicts the following. (1) The quasiperiodic Penrose tiling pattern for the nested coiled structure of the DNA molecule in the chromosome resulting in maximum packing efficiency. (2) The DNA molecule functions as a unified whole fuzzy logic network with ordered two-way signal transmission between the coding and non-coding regions. Recent studies indicate influence of non-coding regions on functions of coding regions in the DNA molecule.
Physical-layer Network Coding (PNC) can significantly improve the throughput of wireless two way relay channel (TWRC) by allowing the two end nodes to transmit messages to the relay simultaneously. To achieve reliable communication, channel coding could be applied on top of PNC. This paper investigates link-by-link channel-coded PNC, in which a critical process at the relay is to transform the superimposed channel-coded packets received from the two end nodes plus noise, Y3=X1+X2+W3, to the network-coded combination of the source packets, S1 XOR S2 . This is in distinct to the traditional multiple-access problem, in which the goal is to obtain S1 and S2 separately. The transformation from Y3 to (S1 XOR S2) is referred to as the Channel-decoding-Network-Coding process (CNC) in that it involves both channel decoding and network coding operations. A contribution of this paper is the insight that in designing CNC, we should first (i) channel-decode Y3 to the superimposed source symbols S1+S2 before (ii) transforming S1+S2 to the network-coded packets (S1 XOR S2) . Compared with previously proposed strategies for CNC, this strategy reduces the channel-coding network-coding mismatch. It is not obvious, however, that an efficient decoder for step (i) exists. A second contribution of this paper is to provide an explicit construction of such a decoder based on the use of the Repeat Accumulate (RA) code. Specifically, we redesign the belief propagation algorithm of the RA code for traditional point-to-point channel to suit the need of the PNC multiple-access channel. Simulation results show that our new scheme outperforms the previously proposed schemes significantly in terms of BER without added complexity.
Based on the BioBricks standard, restriction synthesis is a novel catabolic iterative DNA synthesis method that utilizes endonucleases to synthesize a query sequence from a reference sequence. In this work, the reference sequence is built from shorter subsequences by classifying them as applicable or inapplicable for the synthesis method using three different machine learning methods: Support Vector Machines (SVMs), random forest, and Convolution Neural Networks (CNNs). Before applying these methods to the data, a series of feature selection, curation, and reduction steps are applied to create an accurate and representative feature space. Following these preprocessing steps, three different pipelines are proposed to classify subsequences based on their nucleotide sequence and other relevant features corresponding to the restriction sites of over 200 endonucleases. The sensitivity using SVMs, random forest, and CNNs are 94.9%, 92.7%, 91.4%, respectively. Moreover, each method scores lower in specificity with SVMs, random forest, and CNNs resulting in 77.4%, 85.7%, and 82.4%, respectively. In addition to analyzing these results, the misclassifications in SVMs and CNNs are investigated. Across these two models, different features with a derived nucleotide specificity visually contribute more to classification compared to other features. This observation is an important factor when considering new nucleotide sensitivity features for future studies.