Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females

161497-Thumbnail Image.png
Description
The Pathways of Distinction Analysis (PoDA) program calculates relationships between a given group of genes contained within a pathway, and a disease state. It was used here to investigate liver cancer, and to explore how genetic variability may contribute to

The Pathways of Distinction Analysis (PoDA) program calculates relationships between a given group of genes contained within a pathway, and a disease state. It was used here to investigate liver cancer, and to explore how genetic variability may contribute to the different rates of development of the disease in males and females. The goal of the study was to identify germline variation that differs by sex in hepatocellular carcinoma. Using the program, multiple pathways and genes were identified to have significant differences in their relationship to liver cancer in males and females. In animal studies, the genes which were identified using the PoDA analysis have been shown to impact liver cancer, often with different results for males and females. While these genes are often the focus in animal models, they are absent from current Genome Wide Association Studies (GWAS) catalogs for humans. By working to bridge the results of animal studies and human studies, the results help to identify the causes of liver cancer, and more specifically, the reason the disease affects males at much higher rates. The differences in pathways identified to be significant for the two sexes indicate the germline variance may play sex-specific roles in the development of hepatocellular carcinoma. Additionally, these results reinforce the capacity of the PoDA analysis to identify genes that may be missed by more traditional GWAS methods. This study lays the groundwork for further investigations into the identified genes and pathways, and how they behave differently within males and females.
Date Created
2021
Agent

Methods for Detecting Mutations in Non-model Organisms

158849-Thumbnail Image.png
Description
Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and

Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
Date Created
2020
Agent

A Curation of the Callithrix penicillata Draft Genome

132054-Thumbnail Image.png
Description
Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled into chromosomes using the 2014 Callithrix jacchus genome as a reference. Then, using that same genome as a reference, 13 of the chromosomes were reverse complimented to be continuous with the 2014 Callithrix jacchus genome. The N50 statistics of the assembly were calculated and found to be 124 Mb. Quality scores were run for the final genome using referee and visualized with a bar plot, with 99% of sites scoring above 0. Heterozygosity was also calculated and found to be 0.3%. Finally, the final version of the genome was visually compared to the 2017 Callithrix jacchus genome and the GRCh38 human genome. This genome was submitted to the NCBIs database to await further approval.
Date Created
2019-12
Agent

Diversity and Distribution of the Desert Stink Beetles: Systematics of the Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae)

156871-Thumbnail Image.png
Description
Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the

Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the tribe Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae) – a lineage of arid-adapted flightless beetles found throughout western North America. Four interconnected studies that jointly increase our knowledge of this group are presented. First, the darkling beetle fauna of the Algodones sand dunes in southern California is examined as a case study to explore the scientific practice of checklist creation. An updated list of the species known from this region is presented, with a critical focus on material now made available through digitization and global aggregation. This part concludes with recommendations for future biodiversity checklist authors. Second, the psammophilic genus Trogloderus LeConte, 1879 is revised. Six new species are described, and the first, multi-gene phylogeny for the genus is inferred. In addition, historical biogeographic reconstructions along with novel hypotheses of speciation patterns within the Intermountain Region are given. In particular, the Kaibab Plateau and Kaiparowitz Formation are found to have promoted speciation on the Colorado Plateau. The Owens Valley and prehistoric Bouse Embayment are similarly hypothesized to drive species diversification in southern California. Third, a novel phylogenomic analysis for the tribe Amphidorini is presented, based on 29 de novo partial transcriptomes. Three putative ortholog sets were discovered and analyzed to infer the relationships between species groups and genera. The existing classification of the tribe is found to be highly inadequate, though the earliest-diverging relationships within the tribe are still in question. Finally, the new phylogenetic framework is used to provide a genus-level revision for the Amphidorini, which previously contained six valid genera and 253 valid species. This updated classification includes more than 100 taxonomic changes and results in the revised tribe consisting of 16 genera, with three being described as new to science.
Date Created
2018
Agent

A Review of the Human Vermiform Appendix and its Proposed Function

133795-Thumbnail Image.png
Description
Since its discovery in 1524, many people have characterized the vermiform appendix. Charles Darwin considered the human appendix to be a vestige and a useless structure. Others at the time opposed this hypothesis. However, Darwin's hypothesis became prevalent one until

Since its discovery in 1524, many people have characterized the vermiform appendix. Charles Darwin considered the human appendix to be a vestige and a useless structure. Others at the time opposed this hypothesis. However, Darwin's hypothesis became prevalent one until recently when there became a renewed interest in the appendix because of advancements in microscopes, knowledge of the immune system, and phylogenetics. In this review, I will argue that the vermiform appendix, although still not completely understood, has important functions. First, I will give the anatomy of the appendix. I will discuss the comparative anatomy between different animals and also primates. I will address the effects of appendicitis and appendectomy. I will give background on vestigial structures and will discuss if the appendix is a vestige. Following, I will review the evolution of the appendix. Finally, I will argue that the function of the appendix is as an immune organ, including discussion of gut-associated lymphoid tissue (GALT), development of lymphoid follicles in GALT and their comparison within different organs, Immunoglobulin A (IgA) function in the gut, biofilms as evidence that the appendix is a safe-house for beneficial bacteria, re-inoculation of the bowel, and protection against recurring infection. I will conclude with future studies that should be conducted to further our understanding of the vermiform appendix.
Date Created
2018-05
Agent

The Impact of Self-Incompatibility Systems on the Prevention of Biparental Inbreeding

127891-Thumbnail Image.png
Description

Inbreeding in hermaphroditic plants can occur through two different mechanisms: biparental inbreeding, when a plant mates with a related individual, or self-fertilization, when a plant mates with itself. To avoid inbreeding, many hermaphroditic plants have evolved self-incompatibility (SI) systems which

Inbreeding in hermaphroditic plants can occur through two different mechanisms: biparental inbreeding, when a plant mates with a related individual, or self-fertilization, when a plant mates with itself. To avoid inbreeding, many hermaphroditic plants have evolved self-incompatibility (SI) systems which prevent or limit self-fertilization. One particular SI system—homomorphic SI—can also reduce biparental inbreeding. Homomorphic SI is found in many angiosperm species, and it is often assumed that the additional benefit of reduced biparental inbreeding may be a factor in the success of this SI system. To test this assumption, we developed a spatially-explicit, individual-based simulation of plant populations that displayed three different types of homomorphic SI. We measured the total level of inbreeding avoidance by comparing each population to a self-compatible population (NSI), and we measured biparental inbreeding avoidance by comparing to a population of self-incompatible plants that were free to mate with any other individual (PSI).

Because biparental inbreeding is more common when offspring dispersal is limited, we examined the levels of biparental inbreeding over a range of dispersal distances. We also tested whether the introduction of inbreeding depression affected the level of biparental inbreeding avoidance. We found that there was a statistically significant decrease in autozygosity in each of the homomorphic SI populations compared to the PSI population and, as expected, this was more pronounced when seed and pollen dispersal was limited. However, levels of homozygosity and inbreeding depression were not reduced. At low dispersal, homomorphic SI populations also suffered reduced female fecundity and had smaller census population sizes. Overall, our simulations showed that the homomorphic SI systems had little impact on the amount of biparental inbreeding in the population especially when compared to the overall reduction in inbreeding compared to the NSI population. With further study, this observation may have important consequences for research into the origin and evolution of homomorphic self-incompatibility systems.

Date Created
2017-11-24
Agent

Regional Genetic Distance of Two Ephemeral Pool Crustaceans:Triops (Branchiopoda:Notostraca) and Streptocephalus (Branchiopoda: Anostraca)

135568-Thumbnail Image.png
Description
Triops (Branchiopoda: Notostraca) and Streptocephalus (Branchiopoda: Anostraca) are two crustaceans which cohabitate in ephemeral freshwater pools. They both lay desiccation resistant eggs that disperse passively to new hydrologically isolated environments. The extent of genetic distance among regions and populations is

Triops (Branchiopoda: Notostraca) and Streptocephalus (Branchiopoda: Anostraca) are two crustaceans which cohabitate in ephemeral freshwater pools. They both lay desiccation resistant eggs that disperse passively to new hydrologically isolated environments. The extent of genetic distance among regions and populations is of perennial interest in animals that live in such isolated habitats. Populations in six natural ephemeral pool habitats located in two different regions of the Sonoran Desert and a transition area between the Sonoran and Chihuahuan Deserts were sampled. Sequences from Genbank were used for reference points in the determination of species as well as to further identify regional genetic distance within species. This study estimated the amount of within and between genetic distance of individuals from each region and population through the use of a neutral marker, cytochrome oxidase I (COI). We concluded that, although the method of passive dispersal may differ between the two genera, the differences do not results in different patterns of genetic distances between regions and populations. Furthermore, we only found the putative species, Triops longicaudatus "short", with enough distinct speciation. Although Triops longicaudatus "long" and Triops newberryi may be in the early stages of speciation, this study does not find enough support to conclude that they have separated.
Date Created
2016-05
Agent

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

135454-Thumbnail Image.png
Description
Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development.

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.
Date Created
2016-05
Agent

Strong Episodic Selection for Natural Competence for Transformation due to Host-Pathogen Dynamics

135440-Thumbnail Image.png
Description
Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated

Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated for meiotic sex in eukaryotes. However, the accumulation of loss-of-function alleles at transformation loci and an increased mutational load from recombining with DNA from dead cells create additional costs to transformation. These costs have been shown to outweigh many of the benefits of recombination under a variety of likely parameters. We investigate an additional proposed benefit of sexual recombination, the Red Queen hypothesis, as it relates to bacterial transformation. Here we describe a computational model showing that host-pathogen coevolution may provide a large selective benefit to transformation and allow transforming cells to invade an environment dominated by otherwise equal non-transformers. Furthermore, we observe that host-pathogen dynamics cause the selection pressure on transformation to vary extensively in time, explaining the tight regulation and wide variety of rates observed in naturally competent bacteria. Host-pathogen dynamics may explain the evolution and maintenance of natural competence despite its associated costs.
Date Created
2016-05
Agent

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

134524-Thumbnail Image.png
Description
With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.
Date Created
2017-05
Agent