Circular RNA characterization and regulatory network prediction in human tissue

Description
Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.
Date Created
2018
Agent

Neoantigen Prediction Pipeline

134237-Thumbnail Image.png
Description
Cells become cancerous due to changes in their genetic makeup. In cancers, an altered amino acid due to a tumor mutation can result in proteins that are identified as "foreign" by the immune system. An MHC molecule will bind to

Cells become cancerous due to changes in their genetic makeup. In cancers, an altered amino acid due to a tumor mutation can result in proteins that are identified as "foreign" by the immune system. An MHC molecule will bind to these "foreign" peptide fragments, also called neoantigens. There are 2 classes of MHC molecules. While the MHC I complex is found in all cells with a nucleus, MHC II complexes are mostly found in antigen presenting cells (APCs), such as macrophages, B cells, and dendritic cells. The MHC molecule then presents the neoantigen on the cell's surface. If an immune cell, such as a T-cell, is able to bind to the neoantigen, it can then destroy the tumor cell. However, there are molecules that act as checkpoints on certain immune cells that have to be activated or inactivated to start an immune response. This ensures that healthy cells are not being killed. However, sometimes cancer cells can find ways to use these checkpoints to avoid being attacked. An example of immunotherapy which has had clinical successes is checkpoint blockade inhibition, which means blocking the activity of immune checkpoint proteins in order to release the "brakes" on the immune system to increase its ability to destroy cancer cells. Studies have found that there is a correlation between mutational load and response to immunotherapy. The goal of this project is to create a pipeline that identifies tumor neoantigens. This involved researching various softwares and implementing them to work together. This project involved developing a neoantigen prediction pipeline, which works with TGen's genomics pipeline, to help understand a patient's immune response. The neoantigen prediction pipeline first creates two protein fastas from the high quality non-synonymous mutations, frameshifts, codon insertions, and codon deletions from vcfmerger. One of the protein fastas includes the mutations, while the other one does not representing the wildtype protein. The pipeline then predicts both classes of HLA genotypes of the MHC molecules using DNA or RNA expression in the form of fastqs. The protein fastas and each HLA are fed into IEDB to obtain peptide-MHC binding predictions. Wildtype peptides and neoantigens with low binding affinities are then removed. RNA expression information is then added into the final text file from dseq and sailfish files from TGen's genomics pipeline.
Date Created
2017-05
Agent

Bayesian Hierarchical Model for Testing Allele Specific Expression Towards the Alternative Allele

137848-Thumbnail Image.png
Description
Identifying associations between genotypes and gene expression levels using next-generation technology has enabled systematic interrogation of regulatory variation underlying complex phenotypes. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation (Main, 2009). Interest

Identifying associations between genotypes and gene expression levels using next-generation technology has enabled systematic interrogation of regulatory variation underlying complex phenotypes. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation (Main, 2009). Interest in the existence of allele-specific expression in autosomal genes evolved with the increased awareness of the important role that variation in non-coding DNA sequences can play in determining phenotypic diversity, and the essential role parent-of-origin expression has in early development (Knight, 2004). As new implications of high-throughput sequencing are conceived, it is becoming increasingly important to develop statistical methods tailored to large and formidably complex data sets in order to maximize the biological insights derived from next-generation sequencing experiments. Here, a Bayesian hierarchical probability model based on the beta-binomial distribution is proposed as a possible approach for quantifying allele-specific expression from whole genome (WGS) and whole transcriptome (RNA-seq) data. Pipeline for the analysis of WGS and RNA-seq data sets from ten samples was developed and implemented, while allele-specific expression (ASE) was quantified from both haplotypes using individuals heterozygous at the tested variants utilizing the described methodology. Both computational and statistical framework applied accurately quantified ASE, achieving high reproducibility of already described allele-specific genes in the literature. In conclusion, described methodology provides a solid starting point for quantifying allele specific expression across whole genomes.
Date Created
2012-12
Agent