No Arabic abstract
This paper presents, for the first time, four diversity types of protein amino acids. The first type includes two amino acids (G, P), both without standard hydrocarbon side chains; the second one four amino acids, as two pairs [(A, L), (V, I)], all with standard hydrocarbon side chains; the third type comprises the six amino acids, as three pairs [(F, Y), (H, W), (C, M)], two aromatic, two hetero aromatic and two hetero non-aromatic); finally, the fourth type consists of eight amino acids, as four pairs [(S, T), (D, E), (N, Q), (K, R)], all with a functional group which also exists in amino acid functional group (wholly presented: H2N-.CH-COOH; separately: OH, COOH, CONH2, NH2). The insight into existence of four types of diversity was possible only after an insight into the existence of some very new arithmetical regularities, which were so far unknown. Also, as for showing these four types was necessary to reveal the relationships between several key harmonic structures of the genetic code (which we presented in our previous works), this paper is also a review article of the authors researches of the genetic code. By this, the review itself shows that the said harmonic structures are connected through the same (or near the same) chemically determined amino acid pairs, 10 pairs out of the 190 possible.
In this work it is shown that 20 canonical amino acids (AAs) within genetic code appear to be a whole system with strict AAs positions; more exactly, with AAs ordinal number in three variants; first variant 00-19, second 00-21 and third 00-20. The ordinal number follows from the positions of belonging codons, i.e. their digrams (or doublets). The reading itself is a reading in quaternary numbering system if four bases possess the values within a specific logical square: A = 0, C = 1, G = 2, U = 3. By this, all splittings, distinctions and classifications of AAs appear to be in accordance to atom and nucleon number balance as well as to the other physico-chemical properties, such as hydrophobicity and polarity.
The correlations of primary and secondary structures were analyzed using proteins with known structure from Protein Data Bank. The correlation values of amino acid type and the eight secondary structure types at distant position were calculated for distances between -25 and 25. Shapes of the diagrams indicate that amino acids polarity and capability for hydrogen bonding have influence on the secondary structure at some distances. Clear preference of most of the amino acids towards certain secondary structure type classifies amino acids into four groups: alpha-helix admirers, strand admirers, turn and bend admirers and the others. Group four consists of His and Cis, the amino acids that do not show clear preference for any secondary structure. Amino acids from a group have similar physicochemical properties, and the same structural characteristics. The results suggest that amino acid preference for secondary structure type is based on the structural characteristics at Cb and Cg atoms of amino acid. alpha-helix admirers do not have polar heteroatoms on Cb and Cg atoms, nor branching or aromatic group on Cb atom. Amino acids that have aromatic groups or branching on Cb atom are strand admirers. Turn and bend admirers have polar heteroatom on Cb or Cg atoms or do not have Cb atom at all. Our results indicate that polarity and capability for hydrogen bonding have influence on the secondary structure at some distance, and that amino acid preference for secondary structure is caused by structural properties at Cb or Cg atoms.
The twenty protein coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, sequence diversity is necessary for protein folding, function and evolution. Here we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data is remarkably well explained when the cost function accounts for amino acid chemical decay. More than one hundred proteomes reach comparable solutions to the trade-off by different combinations of cost and diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.
The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a mapping between tri-nucleotide sequences, called codons, and amino acids. Since there are 20 amino acids and 64 possible tri-nucleotide sequences, more than one among these 64 triplets can code for a single amino acid which incorporates the problem of degeneracy. This manuscript explains the underlying logic of degeneracy of genetic code based on a mathematical point of view using a parameter named Impression. Classification of protein family is also a long standing problem in the field of Bio-chemistry and Genomics. Proteins belonging to a particular class have some similar bio-chemical properties which are of utmost importance for new drug design. Using the same parameter Impression and using graph theoretic properties we have also devised a new way of classifying a protein family.
In several previous works, I presented the mirror symmetry in the set of protein amino acids, expressed through the number of atoms. Here, however, the same thing is shown but over the number of nucleons and molecules mass. Compared to the previous version of the paper, minimal changes have been made, and Display 2 as well as Figures 3 and 4 have been added.