Physiological and Genetic Mechanisms Underlying Variation in Anoxia Tolerance in Drosophila Melanogaster

The ability to tolerate bouts of oxygen deprivation varies tremendously across the animal kingdom. Adult humans from different regions show large variation in tolerance to hypoxia; additionally, it is widely known that neonatal mammals are much more tolerant to anoxia

The ability to tolerate bouts of oxygen deprivation varies tremendously across the animal kingdom. Adult humans from different regions show large variation in tolerance to hypoxia; additionally, it is widely known that neonatal mammals are much more tolerant to anoxia than their adult counterparts, including in humans. Drosophila melanogaster are very anoxia-tolerant relative to mammals, with adults able to survive 12 h of anoxia, and represent a well-suited model for studying anoxia tolerance. Drosophila live in rotting, fermenting media and a result are more likely to experience environmental hypoxia; therefore, they could be expected to be more tolerant of anoxia than adults. However, adults have the capacity to survive anoxic exposure times ~8 times longer than larvae. This dissertation focuses on understanding the mechanisms responsible for variation in survival from anoxic exposure in the genetic model organism, Drosophila melanogaster, focused in particular on effects of developmental stage (larval vs. adults) and within-population variation among individuals.

Vertebrate studies suggest that surviving anoxia requires the maintenance of ATP despite the loss of aerobic metabolism in a manner that prevents a disruption of ionic homeostasis. Instead, the abilities to maintain a hypometabolic state with low ATP and tolerate large disturbances in ionic status appear to contribute to the higher anoxia tolerance of adults. Furthermore, metabolomics experiments support this notion by showing that larvae had higher metabolic rates during the initial 30 min of anoxia and that protective metabolites were upregulated in adults but not larvae. Lastly, I investigated the genetic variation in anoxia tolerance using a genome wide association study (GWAS) to identify target genes associated with anoxia tolerance. Results from the GWAS also suggest mechanisms related to protection from ionic and oxidative stress, in addition to a protective role for immune function.
Date Created

Tracing the Evolutionary Histories of Leprosy and Tuberculosis using Ancient DNA and Phylogenomics Methods

Leprosy and tuberculosis are age-old diseases that have tormented mankind and left behind a legacy of fear, mutilation, and social stigmatization. Today, leprosy is considered a Neglected Tropical Disease due to its high prevalence in developing countries, while tuberculosis is

Leprosy and tuberculosis are age-old diseases that have tormented mankind and left behind a legacy of fear, mutilation, and social stigmatization. Today, leprosy is considered a Neglected Tropical Disease due to its high prevalence in developing countries, while tuberculosis is highly endemic in developing countries and rapidly re-emerging in several developed countries. In order to eradicate these diseases effectively, it is necessary to understand how they first originated in humans and whether they are prevalent in nonhuman hosts which can serve as a source of zoonotic transmission. This dissertation uses a phylogenomics approach to elucidate the evolutionary histories of the pathogens that cause leprosy and tuberculosis, Mycobacterium leprae and the M. tuberculosis complex, respectively, through three related studies. In the first study, genomes of M. leprae strains that infect nonhuman primates were sequenced and compared to human M. leprae strains to determine their genetic relationships. This study assesses whether nonhuman primates serve as a reservoir for M. leprae and whether there is potential for transmission of M. leprae between humans and nonhuman primates. In the second study, the genome of M. lepraemurium (which causes leprosy in mice, rats, and cats) was sequenced to clarify its genetic relationship to M. leprae and other mycobacterial species. This study is the first to sequence the M. lepraemurium genome and also describes genes that may be important for virulence in this pathogen. In the third study, an ancient DNA approach was used to recover M. tuberculosis genomes from human skeletal remains from the North American archaeological record. This study informs us about the types of M. tuberculosis strains present in post-contact era North America. Overall, this dissertation informs us about the evolutionary histories of these pathogens and their prevalence in nonhuman hosts, which is not only important in an anthropological context but also has significant implications for disease eradication and wildlife conservation.
Date Created

Spatial genetic structure under limited dispersal: theory, methods and consequences of isolation-by-distance

154511-Thumbnail Image.png
Isolation-by-distance is a specific type of spatial genetic structure that arises when parent-offspring dispersal is limited. Many natural populations exhibit localized dispersal, and as a result, individuals that are geographically near each other will tend to have greater genetic similarity

Isolation-by-distance is a specific type of spatial genetic structure that arises when parent-offspring dispersal is limited. Many natural populations exhibit localized dispersal, and as a result, individuals that are geographically near each other will tend to have greater genetic similarity than individuals that are further apart. It is important to identify isolation-by-distance because it can impact the statistical analysis of population samples and it can help us better understand evolutionary dynamics. For this dissertation I investigated several aspects of isolation-by-distance. First, I looked at how the shape of the dispersal distribution affects the observed pattern of isolation-by-distance. If, as theory predicts, the shape of the distribution has little effect, then it would be more practical to model isolation-by-distance using a simple dispersal distribution rather than replicating the complexities of more realistic distributions. Therefore, I developed an efficient algorithm to simulate dispersal based on a simple triangular distribution, and using a simulation, I confirmed that the pattern of isolation-by-distance was similar to other more realistic distributions. Second, I developed a Bayesian method to quantify isolation-by-distance using genetic data by estimating Wright’s neighborhood size parameter. I analyzed the performance of this method using simulated data and a microsatellite data set from two populations of Maritime pine, and I found that the neighborhood size estimates had good coverage and low error. Finally, one of the major consequences of isolation-by-distance is an increase in inbreeding. Plants are often particularly susceptible to inbreeding, and as a result, they have evolved many inbreeding avoidance mechanisms. Using a simulation, I determined which mechanisms are more successful at preventing inbreeding associated with isolation-by-distance.
Date Created

Methods in the assessment of genotype-phenotype correlations in rare childhood disease through orthogonal multi-omics, high-throughput sequencing approaches

153977-Thumbnail Image.png
Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite

Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite information from metabolome is feasible. "Omics" tools provide comprehensive view into biological mechanisms that impact disease trait and risk. In spite of available data types and ability to collect them simultaneously from patients, researchers still rely on their independent analysis. Combining information from multiple biological data can reduce missing information, increase confidence in single data findings, and provide a more complete view of genotype-phenotype correlations. Although rare disease genetics has been greatly improved by exome sequencing, a substantial portion of clinical patients remain undiagnosed. Multiple frameworks for integrative analysis of genomic and transcriptomic data are presented with focus on identifying functional genetic variations in patients with undiagnosed, rare childhood conditions. Direct quantitation of X inactivation ratio was developed from genomic and transcriptomic data using allele specific expression and segregation analysis to determine magnitude and inheritance mode of X inactivation. This approach was applied in two families revealing non-random X inactivation in female patients. Expression based analysis of X inactivation showed high correlation with standard clinical assay. These findings improved understanding of molecular mechanisms underlying X-linked disorders. In addition multivariate outlier analysis of gene and exon level data from RNA-seq using Mahalanobis distance, and its integration of distance scores with genomic data found genotype-phenotype correlations in variant prioritization process in 25 families. Mahalanobis distance scores revealed variants with large transcriptional impact in patients. In this dataset, frameshift variants were more likely result in outlier expression signatures than other types of functional variants. Integration of outlier estimates with genetic variants corroborated previously identified, presumed causal variants and highlighted new candidate in previously un-diagnosed case. Integrative genomic approaches in easily attainable tissue will facilitate the search for biomarkers that impact disease trait, uncover pharmacogenomics targets, provide novel insight into molecular underpinnings of un-characterized conditions, and help improve analytical approaches that use large datasets.
Date Created

HIV evolution: biogeography and intra-individual dynamics

151929-Thumbnail Image.png
The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns of neutralizing antibody and drug resistance. Extracting maximum understanding from such diverse data can only be accomplished by analyzing the viral population from many angles. This body of work explores two primary aspects of HIV sequence evolution, point mutation and recombination, through cross-sectional (inter-individual) and longitudinal (intra-individual) investigations, respectively. Cross-sectional Analysis: The role of Haiti in the subtype B pandemic has been hotly debated for years; while there have been many studies, up to this point, no one has incorporated the well-known mechanism of retroviral recombination into their biological model. Prior to the use of recombination detection, multiple analyses produced trees where subtype B appears to have first entered Haiti, followed by a jump into the rest of the world. The results presented here contest the Haiti-first theory of the pandemic and instead suggest simultaneous entries of subtype B into Haiti and the rest of the world. Longitudinal Analysis: Potential N-linked glycosylation sites (PNGS) are the most evolutionarily dynamic component of one of the most evolutionarily dynamic proteins known to date. While the number of mutations associated with the increase or decrease of PNGS frequency over time is high, there are a set of relatively stable sites that persist within and between longitudinally sampled individuals. Here, I identify the most conserved stable PNGSs and suggest their potential roles in host-virus interplay. In addition, I have identified, for the first time, what may be a gp-120-based environmental preference for N-linked glycosylation sites.
Date Created

Spatial and temporal patterns of population genetic diversity in the fynbos plant, Leucadendron salignum, in the Cape Floral Region of South Africa

151750-Thumbnail Image.png
The Cape Floral Region (CFR) in southwestern South Africa is one of the most diverse in the world, with >9,000 plant species, 70% of which are endemic, in an area of only ~90,000 km2. Many have suggested that the CFR's

The Cape Floral Region (CFR) in southwestern South Africa is one of the most diverse in the world, with >9,000 plant species, 70% of which are endemic, in an area of only ~90,000 km2. Many have suggested that the CFR's heterogeneous environment, with respect to landscape gradients, vegetation, rainfall, elevation, and soil fertility, is responsible for the origin and maintenance of this biodiversity. While studies have struggled to link species diversity with these features, no study has attempted to associate patterns of gene flow with environmental data to determine how CFR biodiversity evolves on different scales. Here, a molecular population genetic data is presented for a widespread CFR plant, Leucadendron salignum, across 51 locations with 5-kb of chloroplast (cpDNA) and 6-kb of unlinked nuclear (nuDNA) DNA sequences in a dataset of 305 individuals. In the cpDNA dataset, significant genetic structure was found to vary on temporal and spatial scales, separating Western and Eastern Capes - the latter of which appears to be recently derived from the former - with the highest diversity in the heart of the CFR in a central region. A second study applied a statistical model using vegetation and soil composition and found fine-scale genetic divergence is better explained by this landscape resistance model than a geographic distance model. Finally, a third analysis contrasted cpDNA and nuDNA datasets, and revealed very little geographic structure in the latter, suggesting that seed and pollen dispersal can have different evolutionary genetic histories of gene flow on even small CFR scales. These three studies together caution that different genomic markers need to be considered when modeling the geographic and temporal origin of CFR groups. From a greater perspective, the results here are consistent with the hypothesis that landscape heterogeneity is one driving influence in limiting gene flow across the CFR that can lead to species diversity on fine-scales. Nonetheless, while this pattern may be true of the widespread L. salignum, the extension of this approach is now warranted for other CFR species with varying ranges and dispersal mechanisms to determine how universal these patterns of landscape genetic diversity are.
Date Created

The effects of natural selection and random genetic drift in structured populations

150272-Thumbnail Image.png
Building mathematical models and examining the compatibility of their theoretical predictions with empirical data are important for our understanding of evolution. The rapidly increasing amounts of genomic data on polymorphisms greatly motivate evolutionary biologists to find targets of positive selection.

Building mathematical models and examining the compatibility of their theoretical predictions with empirical data are important for our understanding of evolution. The rapidly increasing amounts of genomic data on polymorphisms greatly motivate evolutionary biologists to find targets of positive selection. Although intensive mathematical and statistical studies for characterizing signatures of positive selection have been conducted to identify targets of positive selection, relatively little is known about the effects of other evolutionary forces on signatures of positive selection. In this dissertation, I investigate the effects of various evolutionary factors, including purifying selection and population demography, on signatures of positive selection. Specifically, the effects on two highly used methods for detecting positive selection, one by Wright's Fst and its analogues and the other by footprints of genetic hitchhiking, are investigated. In Chapters 2 and 3, the effect of purifying selection on Fst is studied. The results show that purifying selection intensity greatly affects Fst by modulating allele frequencies across populations. The footprints of genetic hitchhiking in a geographically structured population are studied in Chapter 4. The results demonstrate that footprints of genetic hitchhiking are significantly influenced by geographic structure, which may help scientists to infer the origin and spread of the beneficial allele. In Chapter 5, the stochastic dynamics of a hitchhiking allele are studied using the diffusion process of genetic hitchhiking conditioned on the fixation of the beneficial allele. Explicit formulae for the conditioned two-locus diffusion process of genetic hitchhiking are derived and stochastic aspects of genetic hitchhiking are investigated. The results in this dissertation show that it is essential to model the interaction of neutral and selective forces for correct identification of the targets of positive selection.
Date Created

Molecular Evolution of Type I Collagen (COL1a1) and Its Relationship to Human Skeletal Diseases

149393-Thumbnail Image.png
Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity among human populations due in part to underlying genetic differentiation. With >600 disease-associated mutations (DAMs), COL1a1, which encodes the primary subunit of type I collagen, the

Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity among human populations due in part to underlying genetic differentiation. With >600 disease-associated mutations (DAMs), COL1a1, which encodes the primary subunit of type I collagen, the main structural protein in bone, is most commonly associated with this phenotypic variation. Although numerous studies have explored genotype-phenotype relationships with COL1a1, surprisingly, no study has undertaken an evolutionary approach to determine how changes in constraint over time can be modeled to help predict bone-related disease factors. Here, molecular population and comparative species genetic analyses were conducted to characterize the evolutionary history of COL1a1. First, nucleotide and protein sequences of COL1a1 in 14 taxa representing ~450 million years of vertebrate evolution were used to investigate constraint across gene regions. Protein residues of historically high conservation are significantly correlated with disease severity today, providing a highly accurate model for disease prediction, yet interestingly, intron composition also exhibits high conservation suggesting strong historical purifying selection. Second, a human population genetic analysis of 192 COL1a1 nucleotide sequences representing 10 ethnically and geographically diverse samples was conducted. This random sample of the population shows surprisingly high numbers of amino acid polymorphisms (albeit rare in frequency), suggesting that not all protein variants today are highly deleterious. Further, an unusual haplotype structure was identified across populations, but which is only associated with noncoding variation in the 5' region of COL1a1 where gene expression alteration is most likely. Finally, a population genetic analysis of 40 chimpanzee COL1a1 sequences shows no amino acid polymorphism, yet does reveal an unusual haplotype structure with significantly extended linkage disequilibrium >30 kilobases away, as well as a surprisingly common exon duplication that is generally highly deleterious in humans. Altogether, these analyses indicate a history of temporally and spatially varying purifying selection on not only coding, but noncoding COL1a1 regions that is also reflected in population differentiation. In contrast to clinical studies, this approach reveals potentially functional variation, which in future analyses could explain the observed bone strength variation not only seen within humans, but other closely related primates.
Date Created