Nest Composition and Architecture of the Curve-billed Thrasher in Central Arizona

171615-Thumbnail Image.png
Description
The nests of the Curve-billed Thrasher (Toxostoma curvirostre) were studied across the greater Phoenix area from 2020-2022 in order to assess any significant relationships between their composition and the composition of their environment. Nests were collected and measured, and the

The nests of the Curve-billed Thrasher (Toxostoma curvirostre) were studied across the greater Phoenix area from 2020-2022 in order to assess any significant relationships between their composition and the composition of their environment. Nests were collected and measured, and the vegetation was surveyed to 100 m for potential nest material type. In the lab, nests were separated by material type and tallied. The dense cores of the nests received a 100-piece sampling, with the first hundred pieces plucked from the structure, sorted by type, and massed. Ordinary least squares (OLS) and binomial regression analyses were performed on the body tallies and their corresponding site tallies. Core material weights and their corresponding site tallies only received OLS regression analyses. Beta regression analyses were also performed on the mass proportions of core samples and their corresponding environmental tallies. OLS regression yielded a significant relationship between the spiny body material tally and its site tallies at 25 and 100 m. While failing the assumption of normality, the tally of barrel cactus in a nest body yielded significant p-values in OLS and binomial regression, as well as the Spearman’s correlation test, supporting a strong correlation with the 100m site tally. The tally of anthropogenic materials and the distance to the nearest man-made structure failed the test of normality, but yielded significant p-values in binomial regression and the Spearman’s correlation test. OLS regression of log anthropogenic tally and log distance to nearest structure failed normality but yielded a significant p-value as well. In beta regression analyses, only the spiny core mass proportion yielded a significant relationship at the 100 m site tally.
Date Created
2022
Agent

A Spatial Proteome of Paramecium tetraurelia

171538-Thumbnail Image.png
Description
I studied the evolution and cell biology of Paramecium tetraurelia—a model ciliate with over 40,000 distinct protein-coding genes resulting from as many as three ancient whole-genome duplication events. I was interested in the functional diversification of these gene duplicates at

I studied the evolution and cell biology of Paramecium tetraurelia—a model ciliate with over 40,000 distinct protein-coding genes resulting from as many as three ancient whole-genome duplication events. I was interested in the functional diversification of these gene duplicates at the level of protein localization, but the commonly used tools to study this were tedious. I instead applied a protein-correlation profiling approach to this system by way of generating a dozen sub-cellular fractions with different protein constituents due to the density of their resident organelle and then assayed these fractions using quantitative mass spectrometry. Each protein’s unique abundance profile provided evidence for its subcellular localization, and I used both supervised and unsupervised classification algorithms to cluster proteins together based on the similarity of these profiles to several hundred “marker proteins” which I manually curated. After expanding the protein inventory for numerous organelles by as many as a thousand proteins, I determined many features not previously understood or appreciated such as mosaic biochemical pathways, evidence for differential sorting mechanisms, and the abnormal evolutionary patterns of the mitochondrial proteome of ciliates. I developed a simple bioinformatic tool to probe spatial proteomics datasets more easily for proteins of interest. I demonstrate its applicability using a handful of well-characterized proteins in the budding yeast Saccharomyces cerevisiae as well as interesting proteins in less well-studied model systems like P. tetraurelia and the apicomplexan Toxoplasma gondii to both recapitulate known interactions and discover new ones. Finally, I look for large-scale evidence of gene duplicates relocalizing to new cellular compartments in P. tetraurelia and S. cerevisiae using this new dataset and a previously generated one, respectively. I find thousands of pairs of duplicates which are differentially identified and display evidence for subcellular divergence, and this seems to be largely decoupled from large changes in protein sequence but are instead associated with indels in their N-terminal peptide. These findings support the use of high-throughput proteomic techniques to determine evidence of functional divergence of gene duplicates. Taken together, this works provides a deep characterization of one of the largest unicellular proteomes in nature.
Date Created
2022
Agent

Profiling of Indel Phases in Coding Regions

171500-Thumbnail Image.png
Description
Advances in sequencing technology have generated an enormous amount of data over the past decade. Equally advanced computational methods are needed to conduct comparative and functional genomic studies on these datasets, in particular tools that appropriately interpret indels within an

Advances in sequencing technology have generated an enormous amount of data over the past decade. Equally advanced computational methods are needed to conduct comparative and functional genomic studies on these datasets, in particular tools that appropriately interpret indels within an evolutionary framework. The evolutionary history of indels is complex and often involves repetitive genomic regions, which makes identification, alignment, and annotation difficult. While previous studies have found that indel lengths in both deoxyribonucleic acid and proteins obey a power law, probabilistic models for indel evolution have rarely been explored due to their computational complexity. In my research, I first explore an application of an expectation-maximization algorithm for maximum-likelihood training of a codon substitution model. I demonstrate the training accuracy of the expectation-maximization on my substitution model. Then I apply this algorithm on a published 90 pairwise species dataset and find a negative correlation between the branch length and non-synonymous selection coefficient. Second, I develop a post-alignment fixation method to profile each indel event into three different phases according to its codon position. Because current codon-aware models can only identify the indels by placing the gaps between codons and lead to the misalignment of the sequences. I find that the mouse-rat species pair is under purifying selection by looking at the proportion difference of the indel phases. I also demonstrate the power of my sliding-window method by comparing the post-aligned and original gap positions. Third, I create an indel-phase moore machine including the indel rates of three phases, length distributions, and codon substitution models. Then I design a gillespie simulation that is capable of generating true sequence alignments. Next I develop an importance sampling method within the expectation-maximization algorithm that can successfully train the indel-phase model and infer accurate parameter estimates from alignments. Finally, I extend the indel phase analysis to the 90 pairwise species dataset across three alignment methods, including Mafft+sw method developed in chapter 3, coati-sampling methods applied in chapter 4, and coati-max method. Also I explore a non-linear relationship between the dN/dS and Zn/(Zn+Zs) ratio across 90 species pairs.
Date Created
2022
Agent

A Survey of Oribatids of the North American Deserts

Description
This paper is a survey of the Oribatid mites of the North American deserts. It contains four chapters. Chapter 1 gives an overview of the biology of mites and oribatids. I talk about their phylogeny, body parts, food sources, habitats,

This paper is a survey of the Oribatid mites of the North American deserts. It contains four chapters. Chapter 1 gives an overview of the biology of mites and oribatids. I talk about their phylogeny, body parts, food sources, habitats, and lifecycle. In Chapter 2, I identify a group of 59 oribatid species with cosmopolitan or semi-cosmopolitan distributions and examine how the number of biogeographical regions where a species has been detected relates to body length and to reproductive mode (sexual or parthenogenetic). I also present an illustrated guide (File S1) to 58 of these species for use in identifying cosmopolitan species in oribatid surveys. Chapter 3 describes the current state of knowledge of oribatid diversity in the southwestern US and northern Mexico. In total, I was able to find records for 340 oribatid species from this region in the published literature and museum collections. However, we can see that some states, such as Arizona and Sonora, do not have many published records and that further studies are needed to more fully characterize oribatid diversity within this region. Finally, Chapter 4 describes some preliminary efforts to culture oribatid mites sampled from oak woodland in the Santa Rita Mountains of southeast Arizona. Although this work was interrupted by the COVID-19 crisis, I was able to keep three oribatid species in captivity long enough for them to lay eggs and for some of these eggs to hatch.
Date Created
2020-12
Agent

Geographical Variation in Social Structure, Morphology, and Genetics of the New World Honey Ant Myrmecocystus mendax

156606-Thumbnail Image.png
Description
Persistent cooperation between unrelated conspecifics rarely occurs in mature eusocial insect societies. In this dissertation, I present evidence of non-kin cooperation in the Nearctic honey ant Myrmecocystus mendax. Using microsatellite markers, I show that mature colonies in the Sierra Ancha

Persistent cooperation between unrelated conspecifics rarely occurs in mature eusocial insect societies. In this dissertation, I present evidence of non-kin cooperation in the Nearctic honey ant Myrmecocystus mendax. Using microsatellite markers, I show that mature colonies in the Sierra Ancha Mountain of central Arizona contain multiple unrelated matrilines, an observation that is consistent with primary polygyny. In contrast, similar analyses suggest that colonies in the Chiricahua Mountains of southeastern Arizona are primarily monogynous. These interpretations are consistent with field and laboratory observations. Whereas cooperative colony founding was observed frequently among groups of Sierra Ancha foundresses, founding in the Chiricahua population was restricted to individual foundresses. Furthermore, Sierra Ancha foundresses successfully established incipient laboratory colonies without undergoing queen culling following emergence of the first workers. Multi-queen laboratory Sierra Ancha colonies also produced more workers and repletes than haplometrotic colonies, and when brood raiding was induced between colonies, queens of those with more workers had a higher survival probability.

Microsatellite analyses of additional locations within the M. mendax range suggest that polygyny is also present in some other populations, especially in central-northern Arizona, albeit at lower frequencies than that in the Sierra Anchas. In addition, analyses of multiple types of genetic data, including microsatellites, the mitochondrial barcoding region, and over 2000 nuclear ultra-conserved elements indicate that M. mendax populations within the southwestern U.S. and northwestern Mexico are geographically structured, with strong support for the existence of two or more divergent clades as well as isolation-by-distance within clades. This structure is further shown to correlate with variation in queen number and hair length, a diagnostic taxonomic feature used to distinguish honey ant species.

Together, these findings suggest that regional ecological pressures (e.g. colony density , climate) may have acted on colony founding and social strategy to select for increasing workforce size and, along with genetic drift, have driven geographically isolated M. mendax populations to differentiate genetically and morphologically. The presence of colony fusion in the laboratory and life history traits in honey ant that are influenced by colony size, including repletism, brood raiding, and tournament, support this evolutionary scenario.
Date Created
2018
Agent

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

134524-Thumbnail Image.png
Description
With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.
Date Created
2017-05
Agent

Selection of the AMA-1 Gene in Plasmodium falciparum and Plasmodium vivax

Description
Plasmodium falciparum and Plasmodium vivax are two of the main propagators of human malaria. Both species contain the protein, Apical Membrane Antigen 1 (AMA-1), which is involved in the process of host cell invasion. However, the high degree of polymorphisms

Plasmodium falciparum and Plasmodium vivax are two of the main propagators of human malaria. Both species contain the protein, Apical Membrane Antigen 1 (AMA-1), which is involved in the process of host cell invasion. However, the high degree of polymorphisms and antigenic diversity in this protein has prevented consistent single-vaccine success. Furthermore, the three main domains within AMA-1 (Domains I, II, and III), possess variable polymorphic features and levels of diversity. Overcoming this issue may require an understanding of the type of selection acting on AMA-1 in P. falciparum and P. vivax. Therefore, this investigation aimed to determine the type of selection acting on the whole AMA-1 coding sequence and in each domain for P. falciparum and P. vivax. Population structure was investigated on a global scale and among individual countries. AMA-1 sequences were obtained from the National Center for Biotechnology. For P. falciparum, 649 complete and 382 partial sequences were obtained. For P. vivax, 395 sequences were obtained (370 partial). The AMA-1 gene in P. falciparum was found to possess high nonsynonymous polymorphisms and disproportionately low synonymous polymorphisms. Domain I was found to have the most diverse region with consistently high nonsynonymous substitutions across all countries. Large, positive, and significant Z-test scores indicated the presence of positive selection while FST and NST values showed low genetic differentiation across populations. Data trends for all analyses were relatively consistent for the global and country-based analyses. The only country to deviate was Venezuela, which was the only South American country analyzed. Network analyses did not show distinguishable groupings. For P. falciparum, it was concluded that positive diversifying selection was acting on the AMA-1 gene, particularly in Domain I. In AMA-1 of P. vivax, nonsynonymous and synonymous polymorphisms were relatively equal across all analyses. FST and NST values were high, indicating that countries were genetically distinct populations. Network analyses did not show distinguishable grouping; however, the data was limited to small sample sizes. From the data, it was concluded that AMA-1 in P. vivax was evolving neutrally, where selective pressures did not strongly encourage positive or purifying selection specifically. In addition, different AMA-1 P. vivax strains were genetically distinct and this genetic identity correlated with geographic region. Therefore, AMA-1 strains in P. falciparum and P. vivax not only evolve differently and undergo different form of selection, but they also require different vaccine development strategies. A combination of strain-specific vaccines along with preventative measures on an environmental level will likely be more effective than trying to achieve a single, comprehensive vaccine.
Date Created
2015-05
Agent