Isolation of Treponema pallidum from Wild Chimpanzees and Gorillas

Treponemal disease in primates is caused by the spirochaete bacteria Treponema pallidum. Three subspecies of T. pallidum are currently recognized; pallidum, pertenue, and endemicum. In humans, these are generally associated with the diseases syphilis, yaws, and bejel, respectively. Syphilis is

Treponemal disease in primates is caused by the spirochaete bacteria Treponema pallidum. Three subspecies of T. pallidum are currently recognized; pallidum, pertenue, and endemicum. In humans, these are generally associated with the diseases syphilis, yaws, and bejel, respectively. Syphilis is located worldwide and spreads through sexual contact, while yaws and bejel are geographically limited and spread by skin-to-skin contact. Despite different clinical presentations, these subspecies are very similar genetically and are unable to be serologically distinguished. Reports of symptoms resembling treponemal disease in non-human primates (NHPs) date to the 1960s, though few studies have been executed to isolate and study T. pallidum from NHPs on a molecular level. Obtaining whole-genome sequences of T. pallidum from a variety of NHPs will help efforts to determine evolutionary relationships of strains within and between species. Currently, no whole-genome sequences of T. pallidum have been sequenced from chimpanzees or gorillas. In this thesis, I will determine if T. pallidum is detectable in fecal samples from NHP’s with visible signs of treponemal infection using a polymerase chain reaction (PCR) method.
Date Created

Comparison of Two DNA Extraction Methods for Isolating Mycobacterium leprae DNA from FFPE Tissue Collected in the Pacific Islands Region

187692-Thumbnail Image.png
Mycobacterium leprae, the causative agent of Hansen’s disease (leprosy), has plagued humans and other animal species for millennia and remains of concern to public health throughout the world today. Recent research into the expanded use of medical tissues preserved as

Mycobacterium leprae, the causative agent of Hansen’s disease (leprosy), has plagued humans and other animal species for millennia and remains of concern to public health throughout the world today. Recent research into the expanded use of medical tissues preserved as formalin-fixed, paraffin-embedded samples (FFPE), opened the door for the study of M. leprae DNA from preserved skin samples. However, problems persist with damage to the DNA including fragmentation and cross linkage. This study evaluated two methods commonly used for the recovery of host DNA from FFPE samples for their efficacy in extracting pathogen DNA (hot alkaline lysis protocol and QIAGEN QIAamp FFPE DNA kit). Twenty FFPE skin samples collected from 1995-2015 from human subjects in the Pacific Islands suffering from M. leprae infection, each exhibiting a range of bacillary loads, were analyzed to determine which extraction method was most successful in terms of ability to consistently yield reliable, robust traces of M. leprae infection. This study further examined these samples to understand the phylogeny of leprosy in the region, where gaps in the evolutionary history of M. leprae persist. DNA recovery from paired samples was similar using either method. However, by extending the incubation time of post-paraffin removal sample lysis, both protocols were more likely to yield positive traces of M. leprae, with this enhancement being especially evident in paucibacillary samples with low bacterial presence. The qPCR assay findings suggest that the hot alkaline procedure is most likely to yield positive identification of infection in these traditionally challenging samples.
Date Created

Comparison of Single Stranded Versus Double Stranded DNA Libraries for Degraded DNA


In biology and medicine today, Next Generation Sequencing (NGS) is used to quantify entire genomes and has changed genomics research by providing a low cost, streamlined approach to producing large amounts of genetic data. One of the main steps of

In biology and medicine today, Next Generation Sequencing (NGS) is used to quantify entire genomes and has changed genomics research by providing a low cost, streamlined approach to producing large amounts of genetic data. One of the main steps of NGS is library preparation and these libraries can be double or single stranded. When DNA is degraded or damaged, it can be difficult to create into double stranded libraries and analyze. In this case, single stranded libraries can be prepared when DNA input is low. However, most research on comparing single and double stranded libraries for degraded DNA is limited to ancient DNA. Here we compare SRSLY single stranded DNA libraries with Illumina double stranded DNA libraries using modern degraded DNA samples from deceased unidentified individuals. Our results potentially show that single stranded libraries had a greater concentration of degraded DNA. However, further research must be conducted using qPCR to definitively state that single stranded library preparation was more effective in capturing the modern degraded DNA.

Date Created

Novel DNA Extraction Methods for Mollusks and the History and Significance of Bermuda Land Snails

148289-Thumbnail Image.png

Bermuda Land Snails make up a genus called Poecilozonites that is endemic to Bermuda and is extensively present in its fossil record. These snails were also integral to the creation of the theory of punctuated equilibrium. The DNA of mollusks

Bermuda Land Snails make up a genus called Poecilozonites that is endemic to Bermuda and is extensively present in its fossil record. These snails were also integral to the creation of the theory of punctuated equilibrium. The DNA of mollusks is difficult to sequence because of a class of proteins called mucopolysaccharides that are present in high concentrations in mollusk tissue, and are not removed with standard DNA extraction methods. They inhibit Polymerase Chain Reactions (PCRs) and interfere with Next Generation Sequencing methods. This paper will discuss the DNA extraction methods that were designed to remove the inhibitory proteins that were tested on another gastropod species (Pomacea canaliculata). These were chosen because they are invasive and while they are not pulmonates, they are similar enough to Bermuda Land Snails to reliably test extraction methods. The methods that were tested included two commercially available kits: the Qiagen Blood and Tissue Kit and the Omega Biotek Mollusc Extraction Kit, and one Hexadecyltrimethylammonium Bromide (CTAB) Extraction method that was modified for use on mollusk tissue. The Blood and Tissue kit produced some DNA, the mollusk kit produced almost none, and the CTAB Extraction Method produced the highest concentrations on average, and may prove to be the most viable option for future extractions. PCRs attempted with the extracted DNA have all failed, though it is likely due to an issue with reagents. Further spectrographic analysis of the DNA from the test extractions has shown that they were successful at removing mucopolysaccharides. When the protocol is optimized, it will be used to extract DNA from the tissue from six individuals from each of the two extant species of Bermuda Land Snails. This DNA will be used in several experiments involving Next Generation Sequencing, with the goal of assembling a variety of genome data. These data will then be used to a construct reference genome for Bermuda Land Snails. The genomes generated by this project will be used in population genetic analyses between individuals of the same species, and between individuals of different species. These analyses will then be used to aid in conservation efforts for the species.

Date Created

Evaluating variant calling best practices

131582-Thumbnail Image.png
Analyzing human DNA sequence data allows researchers to identify variants associated with disease, reconstruct the demographic histories of human populations, and further understand the structure and function of the genome. Identifying variants in whole genome sequences is a crucial bioinformatics

Analyzing human DNA sequence data allows researchers to identify variants associated with disease, reconstruct the demographic histories of human populations, and further understand the structure and function of the genome. Identifying variants in whole genome sequences is a crucial bioinformatics step in sequence data processing and can be performed using multiple approaches. To investigate the consistency between different bioinformatics methods, we compared the accuracy and sensitivity of two genotyping strategies, joint variant calling and single-sample variant calling. Autosomal and sex chromosome variant call sets were produced by joint and single-sample calling variants for 10 female individuals. The accuracy of variant calls was assessed using SNP array genotype data collected from each individual. To compare the ability of joint and single-sample calling to capture low-frequency variants, folded site frequency spectra were constructed from variant call sets. To investigate the potential for these different variant calling methods to impact downstream analyses, we estimated nucleotide diversity for call sets produced using each approach. We found that while both methods were equally accurate when validated by SNP array sites, single-sample calling identified a greater number of singletons. However, estimates of nucleotide diversity were robust to these differences in the site frequency spectrum between call sets. Our results suggest that despite single-sample calling’s greater sensitivity for low-frequency variants, the differences between approaches have a minimal effect on downstream analyses. While joint calling may be a more efficient approach for genotyping many samples, in situations that preclude large sample sizes, our study suggests that single-sample calling is a suitable alternative.
Date Created

A Curation of the Callithrix penicillata Draft Genome

132054-Thumbnail Image.png
Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled into chromosomes using the 2014 Callithrix jacchus genome as a reference. Then, using that same genome as a reference, 13 of the chromosomes were reverse complimented to be continuous with the 2014 Callithrix jacchus genome. The N50 statistics of the assembly were calculated and found to be 124 Mb. Quality scores were run for the final genome using referee and visualized with a bar plot, with 99% of sites scoring above 0. Heterozygosity was also calculated and found to be 0.3%. Finally, the final version of the genome was visually compared to the 2017 Callithrix jacchus genome and the GRCh38 human genome. This genome was submitted to the NCBIs database to await further approval.
Date Created

Using Ancient DNA Methods to Examine Dire Wolf Population History

134043-Thumbnail Image.png
Dire wolves have recently risen to fame as a result of the popular television program Game of Thrones, and thus many viewers know dire wolves as the sigil and loyal companions of the Stark house. Far fewer recognize dire wolves

Dire wolves have recently risen to fame as a result of the popular television program Game of Thrones, and thus many viewers know dire wolves as the sigil and loyal companions of the Stark house. Far fewer recognize dire wolves by their scientific name, Canis dirus, or understand the population history of this ‘fearsome wolf’ species that roamed the Americas until the megafaunal mass extinction event of the Late Pleistocene. Although numerous studies have examined the species using morphological and geographical methods, thus far their results have been either inconclusive or contradictory. Remaining questions include the relationships dire wolves share with other members of the Canis genus and the internal structure of their populations. Advancements in ancient DNA recovery methods may make it possible to study dire wolf specimens at the molecular level for the first time and may therefore prove useful in clarifying the answers to these questions. Eighteen dire wolf specimens were collected from across the United States and subjected to ancient DNA extraction, library preparation, amplification and purification, bait preparation and capture, and next-generation sequencing. There was an average of 76.9 unique reads and 5.73% coverage when mapped to the Canis familiaris reference genome in ultraconserved regions of the mitochondrial genome. The results indicate that endogenous ancient DNA was not successfully recovered and perhaps ancient DNA recovery methods have not advanced to the point of retrieving informative amounts of DNA from particularly old, thermally degraded specimens. Nevertheless, the ever-changing nature of ancient DNA research makes it vital to continually test the limitations of the field and suggests that ancient DNA recovery methods will prove useful in illuminating dire wolf population history at some point in the future.
Date Created

Genetic diversity across the pseudoautosomal boundary varies across human populations

135114-Thumbnail Image.png
Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal

Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal boundary between the PAR1 and non-PAR region is thought to have evolved from a Y-specific inversion that suppressed recombination across the boundary. In addition to the two PARs, there is also a human-specific X-transposed region (XTR) that was duplicated from the X to the Y chromosome. Genetic diversity is expected to be higher in recombining than nonrecombining regions, particularly because recombination reduces the effects of linked selection, allowing neutral variation to accumulate. We previously showed that diversity decreases linearly across the previously defined pseudoautosomal boundary (rather than drop suddenly at the boundary), suggesting that the pseudoautosomal boundary may not be as strict as previously thought. In this study, we analyzed data from 1271 genetic females to explore the extent to which the pseudoautosomal boundary varies among human populations (broadly, African, European, South Asian, East Asian, and the Americas). We found that, in all populations, genetic diversity was significantly higher in the PAR1 and XTR than in the non-PAR regions, and that diversity decreased linearly from the PAR1 to finally reach a non-PAR value well past the pseudoautosomal boundary in all populations. However, we also found that the location at which diversity changes from reflecting the higher PAR1 diversity to the lower nonPAR diversity varied by as much as 500 kb among populations. The lack of genetic evidence for a strict pseudoautosomal boundary and the variability in patterns of diversity across the pseudoautosomal boundary are consistent with two potential explanations: (1) the boundary itself may vary across populations, or (2) that population-specific demographic histories have shaped diversity across the pseudoautosomal boundary.
Date Created

HBS1L-MYB loci involvement in Fetal Hemoglobin Expression

137257-Thumbnail Image.png
This project studies two single nucleotide polymorphisms (SNPs) within the HBS1L-MYB loci. Both SNPs are associated with a heightened expression of fetal hemoglobin. DNA samples of NCAA athletes who have sickle cell trait were genotyped to find the allele frequency

This project studies two single nucleotide polymorphisms (SNPs) within the HBS1L-MYB loci. Both SNPs are associated with a heightened expression of fetal hemoglobin. DNA samples of NCAA athletes who have sickle cell trait were genotyped to find the allele frequency of each SNP. When comparing all populations using information provided from the Human Genome Project on Ensembl, the minor A allele has a frequency of 22% and the major, G, allele has a frequency of 78%. The frequency distribution of the minor allele in the population data was higher than the frequency obtained from the sampled data by 15%. This means that the samples, which are heterozygous for sickle cell, display a lower frequency for the mutation than the global population.
Date Created

Sequence Diversity of Pan troglodytes Subspecies and the Impact of WFDC6 Selective Constraints in Reproductive Immunity

130264-Thumbnail Image.png
Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in

Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
Date Created