No Arabic abstract
In several previous works, I presented the mirror symmetry in the set of protein amino acids, expressed through the number of atoms. Here, however, the same thing is shown but over the number of nucleons and molecules mass. Compared to the previous version of the paper, minimal changes have been made, and Display 2 as well as Figures 3 and 4 have been added.
This paper presents, for the first time, four diversity types of protein amino acids. The first type includes two amino acids (G, P), both without standard hydrocarbon side chains; the second one four amino acids, as two pairs [(A, L), (V, I)], all with standard hydrocarbon side chains; the third type comprises the six amino acids, as three pairs [(F, Y), (H, W), (C, M)], two aromatic, two hetero aromatic and two hetero non-aromatic); finally, the fourth type consists of eight amino acids, as four pairs [(S, T), (D, E), (N, Q), (K, R)], all with a functional group which also exists in amino acid functional group (wholly presented: H2N-.CH-COOH; separately: OH, COOH, CONH2, NH2). The insight into existence of four types of diversity was possible only after an insight into the existence of some very new arithmetical regularities, which were so far unknown. Also, as for showing these four types was necessary to reveal the relationships between several key harmonic structures of the genetic code (which we presented in our previous works), this paper is also a review article of the authors researches of the genetic code. By this, the review itself shows that the said harmonic structures are connected through the same (or near the same) chemically determined amino acid pairs, 10 pairs out of the 190 possible.
We present a structural data set of the 20 proteinogenic amino acids and their amino-methylated and acetylated (capped) dipeptides. Different protonation states of the backbone (uncharged and zwitterionic) were considered for the amino acids as well as varied side chain protonation states. Furthermore, we studied amino acids and dipeptides in complex with divalent cations (Ca2+, Ba2+, Sr2+, Cd2+, Pb2+, and Hg2+). The database covers the conformational hierarchies of 280 systems in a wide relative energy range of up to 4 eV (390 kJ/mol), summing up to an overall of 45,892 stationary points on the respective potential-energy surfaces. All systems were calculated on equal first-principles footing, applying density-functional theory in the generalized gradient approximation corrected for long-range van der Waals interactions. We show good agreement to available experimental data for gas-phase ion affinities. Our curated data can be utilized, for example, for a wide comparison across chemical space of the building blocks of life, for the parametrization of protein force fields, and for the calculation of reference spectra for biophysical applications.
The correlations of primary and secondary structures were analyzed using proteins with known structure from Protein Data Bank. The correlation values of amino acid type and the eight secondary structure types at distant position were calculated for distances between -25 and 25. Shapes of the diagrams indicate that amino acids polarity and capability for hydrogen bonding have influence on the secondary structure at some distances. Clear preference of most of the amino acids towards certain secondary structure type classifies amino acids into four groups: alpha-helix admirers, strand admirers, turn and bend admirers and the others. Group four consists of His and Cis, the amino acids that do not show clear preference for any secondary structure. Amino acids from a group have similar physicochemical properties, and the same structural characteristics. The results suggest that amino acid preference for secondary structure type is based on the structural characteristics at Cb and Cg atoms of amino acid. alpha-helix admirers do not have polar heteroatoms on Cb and Cg atoms, nor branching or aromatic group on Cb atom. Amino acids that have aromatic groups or branching on Cb atom are strand admirers. Turn and bend admirers have polar heteroatom on Cb or Cg atoms or do not have Cb atom at all. Our results indicate that polarity and capability for hydrogen bonding have influence on the secondary structure at some distance, and that amino acid preference for secondary structure is caused by structural properties at Cb or Cg atoms.
A general strategy is described for finding which amino acid sequences have native states in a desired conformation (inverse design). The approach is used to design sequences of 48 hydrophobic and polar aminoacids on three-dimensional lattice structures. Previous studies employing a sequence-space Monte-Carlo technique resulted in the successful design of one sequence in ten attempts. The present work also entails the exploration of conformations that compete significantly with the target structure for being its ground state. The design procedure is successful in all the ten cases.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.