Meta-Analysis for Multi-Cancer Early Detection Biomarker Discovery

193690-Thumbnail Image.png
Description
Cancer poses a significant worldwide burden where ongoing efforts are targeted towards improving patient outcomes in which a significant contribution results from cancer screening. Multi-cancer early detection tests have been introduced which measure a series of biomarkers to detect signals

Cancer poses a significant worldwide burden where ongoing efforts are targeted towards improving patient outcomes in which a significant contribution results from cancer screening. Multi-cancer early detection tests have been introduced which measure a series of biomarkers to detect signals that may indicate carcinogenesis in its earliest stages and work in tandem with other diagnostic techniques to localize and verify tumor formation across multiple cancer types. Molecular biomarkers such as autoantibodies are promising candidates for early detection across multiple cancers. This study identifies autoantibodies that are aberrantly expressed across multiple cancer types that may be used to discriminate between healthy individuals and those with cancer from a single serum sample. Multiple datasets are integrated from prior studies to examine 8,200 sera autoantibodies from 5 cancer types including lung adenocarcinoma, basal-like breast cancer, advanced colorectal cancer, ovarian cancer, and HER2+ breast cancer. The diagnostic utility of these autoantibodies is assessed for combined cancer types by meta-receiver operating characteristic (ROC) curve analysis. A meta-analysis data processing pipeline is utilized for processing each biomarker with statistical analysis performed across ROC metrics for each meta-curve including partial area under the curve and sensitivity at a 90% specificity threshold. Results identified 26 autoantibody biomarkers that are useful for multi-cancer detection and may be developed for future clinical applications in cancer screening.
Date Created
2024
Agent

Unveiling Ancestral Echoes in Cancer Fusion Proteins through Structural Homology and Evolutionary Analysis

Description
Fusion genes, arising from chromosomal translocations through nonallelic homologous recombination (NAHR), are pivotal in oncogenesis, leading to the formation of fusion proteins that contribute to cancer’s aggressive nature. The atavism theory posits that cancer is a throwback to an ancient

Fusion genes, arising from chromosomal translocations through nonallelic homologous recombination (NAHR), are pivotal in oncogenesis, leading to the formation of fusion proteins that contribute to cancer’s aggressive nature. The atavism theory posits that cancer is a throwback to an ancient cellular state, with reactivated ancestral cellular mechanisms driving uncontrolled growth and other cancerous traits. By comparing the evolutionary ages of the structural homologs of fusion proteins with those of their parental gene pairs, this study aims to determine whether these fusion proteins recapitulate ancient protein structures, thereby supporting the atavism theory.Utilizing data from the COSMIC database, fusion genes were constructed according to their corresponding cDNA sequences from parent gene pairs, and the 3D structures of resultant fusion proteins were predicted by using AlphaFold. Subsequent VAST analysis identified structural homologies with ancient proteins. The ages of original and fusion proteins were inferred by mapping homologous groups from the Ensembl Compara database to identify common ancestors. The TimeTree database was then used to assign gene ages based on the divergence of the most distantly related species in these groups. Finally, comparing these ages identified ancestral resemblances. The findings of this project demonstrate homology between the structures of most fusion proteins and those of ancient proteins found in humans, yeast, and bacteria, suggesting the re-emergency of ancient protein structures in cancer cells due to recurrent translocations. (Permutation test, p=0.0201). Additionally, a large portion (68%) of the examined fusion genes comprises one gene predating the advent of multicellularity and another emerging concurrently with or after this evolutionary milestone (One-sample proportions test, X-squared=13.291, df=1, p=0.00027). These results support the atavism theory, suggesting that such fusion events might bridge evolutionary gaps between unicellular and multicellular life forms. This could potentially explain the mechanisms behind cancer’s tendency to forsake multicellular characteristics, thereby enhancing malignancy. By illustrating how chromosomal translocations in cancer might be tapping into primordial protein architectures, this study not only provides evidence for the atavism theory but also opens new avenues for understanding cancer’s evolutionary underpinnings. This could lead to novel therapeutic strategies by exploiting the ancient vulnerabilities revealed through chromosomal translocations.
Date Created
2024
Agent

Effects of Lingual Frenectomies on Breastfeeding Dyads

187118-Thumbnail Image.png
Description
The purpose of this research was to determine the impact of undergoing a lingual frenectomies to fix partial ankyloglossia on breastfeeding function the mother infant dyad after completion of the procedure. Changes in breastfeeding were determined using FLIP (Flow, Latch,

The purpose of this research was to determine the impact of undergoing a lingual frenectomies to fix partial ankyloglossia on breastfeeding function the mother infant dyad after completion of the procedure. Changes in breastfeeding were determined using FLIP (Flow, Latch, Injury, Post Feeding Behavior), a validated self-report questionnaire that classifies the severity of breastfeeding dysfunction associated with partial ankyloglossia. Through this, we can diagnose at-risk dyads and determine treatment options. The analysis revealed that 75% of respondents saw significant improvements in the severity and/or frequency of symptoms after completion of the procedure.
Date Created
2023-05
Agent

Data Science Exploration

Description

I have challenged myself to learn Python. I did this because I wanted to improve myself and my mindset around coding. My view on coding has changed immensely. I was intimidated by the social stigmas around coding, but I have

I have challenged myself to learn Python. I did this because I wanted to improve myself and my mindset around coding. My view on coding has changed immensely. I was intimidated by the social stigmas around coding, but I have become more comfortable with it. There were times when I thought that I would never understand something, but it became familiar. Through constant exposure, such as completing modules in DataCamp and Kaggle, I better understood the basics and uses of different models. The concepts I had learned before became clearer by completing a project I was genuinely interested in. I could search for a solution or ask my thesis director if I had an error. I enjoyed working with my thesis professor and failing many times. I have learned that I do not have to be a master within the year but must remain consistent with my practice. I will continue to practice and learn more about coding now with more confidence.

Date Created
2023-05
Agent

ANALYSIS OF FIGHT OUTCOMES IN THE UFC AND THE EFFICACY
OF PREDICTING FIGHT OUTCOMES ESPECIALLY IN RELATION TO SPORTS BETTING

Description
In this study, models will be introduced which are developed from historical UFC data and aim to predict the fight outcomes between mixed martial arts fighters within the UFC. The paper will explore multivariate linear probability regression analysis using variables

In this study, models will be introduced which are developed from historical UFC data and aim to predict the fight outcomes between mixed martial arts fighters within the UFC. The paper will explore multivariate linear probability regression analysis using variables which were provided and developed from a large dataset to effectively predict the probability of a fighter winning a given fight. It will analyze several multivariate regression models and compare, internally, the accuracy of each model and account for limitations within the models. Then, the model’s efficacy will be tested by recent UFC fights and adjusted to find a more accurate equation that maximizes profit in sports betting using implied probability from betting odds and comparing them to the model’s predicted probabilities.
Date Created
2022-12
Agent

Applications of Deep Neural Networks to Neurocognitive Poetics: A Quantitative Study of the Project Gutenberg English Poetry Corpus

Description
With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to

With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to frequent re-analysis over the course of years, decades, or even centuries, there is a certain demand for fresh approaches to familiar tasks, and such breaks from convention may even be necessary for the advancement of the field. Existing quantitative studies of poetry have employed computational techniques in their analyses, however, there remains work to be done with regards to the deployment of deep neural networks on large corpora of poetry to classify portions of the works contained therein based on certain features. While applications of neural networks to social media sites, consumer reviews, and other web-originated data are common within computational linguistics and natural language processing, comparatively little work has been done on the computational analysis of poetry using the same techniques. In this work, I begin to lay out the first steps for the study of poetry using neural networks. Using a convolutional neural network to classify author birth date, I was able to not only extract a non-trivial signal from the data, but also identify the presence of clustering within by-author model accuracy. While definitive conclusions about the cause of this clustering were not reached, investigation of this clustering reveals immense heterogeneity in the traits of accurately classified authors. Further study may unpack this clustering and reveal key insights about how temporal information is encoded in poetry. The study of poetry using neural networks remains very open but exhibits potential to be an interesting and deep area of work.
Date Created
2019-05
Agent

Reproducibility and Repeatability Experiment with Nested Factors in Fingerprint Age Analysis

132655-Thumbnail Image.png
Description
Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine

Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine the decomposition of the variation components in a measurement system. The fingerprint analysis tests whether the measuring system for ridge widths is reproducible and repeatable. Using the new model and traditional measurement systems analysis metrics, we found that the current process to measure ridge widths is not adequate. Further, we discovered that it is possible to use a linear mixed model to decompose the variance of a measurement system.
Date Created
2019-05
Agent

Three essays on correlated binary outcomes: detection and appropriate models

156148-Thumbnail Image.png
Description
Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.
Date Created
2018
Agent

Classication for Conservation: A Random Forest Model to Predict Threatened Marine Species

133732-Thumbnail Image.png
Description
As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN Red List of Threatened Species informs the conservation activities of governments as a world standard of species' risks of extinction. However, the IUCN's current methodology is, in some ways, inefficient given the immense volume of Earth's species and the laboriousness of its species' risk classification process. IUCN assessors can take years to classify a species' extinction risk, even as that species continues to decline. Therefore, to supplement the IUCN's classification process and thus bolster conservationist efforts for threatened species, a Random Forest model was constructed, trained on a group of fish species previously classified by the IUCN Red List. This Random Forest model both validates the IUCN Red List's classification method and offers a highly efficient, supplemental classification method for species' extinction risk. In addition, this Random Forest model is applicable to species with deficient data, which the IUCN Red List is otherwise unable to classify, thus engendering conservationist efforts for previously obscure species. Although this Random Forest model is built specifically for the trained fish species (Sparidae), the methodology can and should be extended to additional species.
Date Created
2018-05
Agent

Understanding the Role of the Repair Response during Localized Tissue Damage in D. melanogaster

135467-Thumbnail Image.png
Description
Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that

Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that damage during development leads to an onset of developmental delay which is proportional to the extent of damage accrued by the organism. In contrast, damage sustained in older organisms does not delay development in response to tissue damage. In the fruit fly, Drosophila melanogaster, damage to wing precursor tissues is associated with developmental retardation if damage is sustained in young larvae. No developmental delay is observed when damage is inflicted closer to pupariation time. Here we use microarray analysis to characterize the genomic response to injury in Drosophila melanogaster in young and old larvae. We also begin to develop tools to examine in more detail, the role that the neurotransmitter dopamine might play in mediating injury-induced developmental delays.
Date Created
2016-05
Agent