Selection originating from protein foldability: I. A new method to estimate selection temperature


Abstract in English

The probability distribution of sequences with maximum entropy that satisfies a given amino acid composition at each site and a given pairwise amino acid frequency at each site pair is a Boltzmann distribution with $exp(-psi_N)$, where the total interaction $psi_N$ is represented as the sum of one body and pairwise interactions. A protein folding theory based on the random energy model (REM) indicates that the equilibrium ensemble of natural protein sequences is a canonical ensemble characterized by $exp(-Delta G_{ND}/k_B T_s)$ or by $exp(- G_{N}/k_B T_s)$ if an amino acid composition is kept constant, meaning $psi_N = Delta G_{ND}/k_B T_s +$ constant, where $Delta G_{ND} equiv G_N - G_D$, $G_N$ and $G_D$ are the native and denatured free energies, and $T_s$ is the effective temperature of natural selection. Here, we examine interaction changes ($Delta psi_N$) due to single nucleotide nonsynonymous mutations, and have found that the variance of their $Delta psi_N$ over all sites hardly depends on the $psi_N$ of each homologous sequence, indicating that the variance of $Delta G_N (= k_B T_s Delta psi_N)$ is nearly constant irrespective of protein families. As a result, $T_s$ is estimated from the ratio of the variance of $Delta psi_N$ to that of a reference protein, which is determined by a direct comparison between $DeltaDelta psi_{ND} (simeq Delta psi_N)$ and experimental $DeltaDelta G_{ND}$. Based on the REM, glass transition temperature $T_g$ and $Delta G_{ND}$ are estimated from $T_s$ and experimental melting temperatures ($T_m$) for 14 protein domains. The estimates of $Delta G_{ND}$ agree well with their experimental values for 5 proteins, and those of $T_s$ and $T_g$ are all within a reasonable range. This method is coarse-grained but much simpler in estimating $T_s$, $T_g$ and $DeltaDelta G_{ND}$ than previous methods.

Download