The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a mapping between tri-nucleotide sequences, called codons, and amino acids. Since there are 20 amino acids and 64 possible tri-nucleotide sequences, more than one among these 64 triplets can code for a single amino acid which incorporates the problem of degeneracy. This manuscript explains the underlying logic of degeneracy of genetic code based on a mathematical point of view using a parameter named Impression. Classification of protein family is also a long standing problem in the field of Bio-chemistry and Genomics. Proteins belonging to a particular class have some similar bio-chemical properties which are of utmost importance for new drug design. Using the same parameter Impression and using graph theoretic properties we have also devised a new way of classifying a protein family.
It is shown that there is a sense in splitting Genetic Code Table (GCT) into three parts using the harmonic mean, calculated by the formula H (a, b) = 2ab / (a + b), where a = 63 and b = 31.5. Within these three parts, the amino acids (AAs) are positioned on the basis of the validity of the evident regularities of key parameters, such as polarity, hydrophobicity and enzyme-mediated amino acid classification. In addition, there are obvious balances of the number of atoms in the nucleotide triplets and corresponding amino acid groups and/or classes.
In this work it is shown that 20 canonical amino acids (AAs) within genetic code appear to be a whole system with strict distinction in Genetic Code Table (GCT) into some different quantums: 20, 23, 61 amino acid molecules. These molecules distinction is followed by specific balanced atom number and/or nucleon number distinctions within those molecules. In this second version two appendices are added; also a new version of Periodic system of numbers, whose first verson is given in arXiv:1107.1998 [q-bio.OT].
This paper presents, for the first time, four diversity types of protein amino acids. The first type includes two amino acids (G, P), both without standard hydrocarbon side chains; the second one four amino acids, as two pairs [(A, L), (V, I)], all with standard hydrocarbon side chains; the third type comprises the six amino acids, as three pairs [(F, Y), (H, W), (C, M)], two aromatic, two hetero aromatic and two hetero non-aromatic); finally, the fourth type consists of eight amino acids, as four pairs [(S, T), (D, E), (N, Q), (K, R)], all with a functional group which also exists in amino acid functional group (wholly presented: H2N-.CH-COOH; separately: OH, COOH, CONH2, NH2). The insight into existence of four types of diversity was possible only after an insight into the existence of some very new arithmetical regularities, which were so far unknown. Also, as for showing these four types was necessary to reveal the relationships between several key harmonic structures of the genetic code (which we presented in our previous works), this paper is also a review article of the authors researches of the genetic code. By this, the review itself shows that the said harmonic structures are connected through the same (or near the same) chemically determined amino acid pairs, 10 pairs out of the 190 possible.
The paper represents three supplements to the source paper, q-bio/0610044 [q-bio.OT], with three new series of harmonic structures of the genetic code, determined by Gauss arithmetical algorithm; by Table of Minimal Adding, as in (Rakocevic, 2011a: Table 4; 2011b: Table 4); all structures in relation to Binary-code tree (Rakocevic, 1998). The determination itself is realized through atom and nucleon number balancing and nuancing of molekular polarity. In the first supplement the word is about some additional harmonic structures in relation to a previous our paper (Rakocevic, 2004); in the second one about the relation that structures with the polarity of protein amino acids. In the third supplement we give new ideas about the genetic code by an inclusion of the notions cipher of the genetic code and the key of that cipher.