Protein conformational dynamics In genomic analysis

Document
Description
Proteins are essential for most biological processes that constitute life. The function of a protein is encoded within its 3D folded structure, which is determined by its sequence of amino acids. A variation of a single nucleotide in the DNA

Proteins are essential for most biological processes that constitute life. The function of a protein is encoded within its 3D folded structure, which is determined by its sequence of amino acids. A variation of a single nucleotide in the DNA during transcription (nSNV) can alter the amino acid sequence (i.e., a mutation in the protein sequence), which can adversely impact protein function and sometimes cause disease. These mutations are the most prevalent form of variations in humans, and each individual genome harbors tens of thousands of nSNVs that can be benign (neutral) or lead to disease. The primary way to assess the impact of nSNVs on function is through evolutionary approaches based on positional amino acid conservation. These approaches are largely inadequate in the regime where positions evolve at a fast rate. We developed a method called dynamic flexibility index (DFI) that measures site-specific conformational dynamics of a protein, which is paramount in exploring mechanisms of the impact of nSNVs on function. In this thesis, we demonstrate that DFI can distinguish the disease-associated and neutral nSNVs, particularly for fast evolving positions where evolutionary approaches lack predictive power. We also describe an additional dynamics-based metric, dynamic coupling index (DCI), which measures the dynamic allosteric residue coupling of distal sites on the protein with the functionally critical (i.e., active) sites. Through DCI, we analyzed 200 disease mutations of a specific enzyme called GCase, and a proteome-wide analysis of 75 human enzymes containing 323 neutral and 362 disease mutations. In both cases we observed that sites with high dynamic allosteric residue coupling with the functional sites (i.e., DARC spots) have an increased susceptibility to harboring disease nSNVs. Overall, our comprehensive proteome-wide analysis suggests that incorporating these novel position-specific conformational dynamics based metrics into genomics can complement current approaches to increase the accuracy of diagnosing disease nSNVs. Furthermore, they provide mechanistic insights about disease development. Lastly, we introduce a new, purely sequence-based model that can estimate the dynamics profile of a protein by only utilizing coevolution information, eliminating the requirement of the 3D structure for determining dynamics.