Bayesian Hierarchical Model for Testing Allele Specific Expression Towards the Alternative Allele

137848-Thumbnail Image.png
Description
Identifying associations between genotypes and gene expression levels using next-generation technology has enabled systematic interrogation of regulatory variation underlying complex phenotypes. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation (Main, 2009). Interest

Identifying associations between genotypes and gene expression levels using next-generation technology has enabled systematic interrogation of regulatory variation underlying complex phenotypes. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation (Main, 2009). Interest in the existence of allele-specific expression in autosomal genes evolved with the increased awareness of the important role that variation in non-coding DNA sequences can play in determining phenotypic diversity, and the essential role parent-of-origin expression has in early development (Knight, 2004). As new implications of high-throughput sequencing are conceived, it is becoming increasingly important to develop statistical methods tailored to large and formidably complex data sets in order to maximize the biological insights derived from next-generation sequencing experiments. Here, a Bayesian hierarchical probability model based on the beta-binomial distribution is proposed as a possible approach for quantifying allele-specific expression from whole genome (WGS) and whole transcriptome (RNA-seq) data. Pipeline for the analysis of WGS and RNA-seq data sets from ten samples was developed and implemented, while allele-specific expression (ASE) was quantified from both haplotypes using individuals heterozygous at the tested variants utilizing the described methodology. Both computational and statistical framework applied accurately quantified ASE, achieving high reproducibility of already described allele-specific genes in the literature. In conclusion, described methodology provides a solid starting point for quantifying allele specific expression across whole genomes.
Date Created
2012-12
Agent

Methods in the assessment of genotype-phenotype correlations in rare childhood disease through orthogonal multi-omics, high-throughput sequencing approaches

153977-Thumbnail Image.png
Description
Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite

Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite information from metabolome is feasible. "Omics" tools provide comprehensive view into biological mechanisms that impact disease trait and risk. In spite of available data types and ability to collect them simultaneously from patients, researchers still rely on their independent analysis. Combining information from multiple biological data can reduce missing information, increase confidence in single data findings, and provide a more complete view of genotype-phenotype correlations. Although rare disease genetics has been greatly improved by exome sequencing, a substantial portion of clinical patients remain undiagnosed. Multiple frameworks for integrative analysis of genomic and transcriptomic data are presented with focus on identifying functional genetic variations in patients with undiagnosed, rare childhood conditions. Direct quantitation of X inactivation ratio was developed from genomic and transcriptomic data using allele specific expression and segregation analysis to determine magnitude and inheritance mode of X inactivation. This approach was applied in two families revealing non-random X inactivation in female patients. Expression based analysis of X inactivation showed high correlation with standard clinical assay. These findings improved understanding of molecular mechanisms underlying X-linked disorders. In addition multivariate outlier analysis of gene and exon level data from RNA-seq using Mahalanobis distance, and its integration of distance scores with genomic data found genotype-phenotype correlations in variant prioritization process in 25 families. Mahalanobis distance scores revealed variants with large transcriptional impact in patients. In this dataset, frameshift variants were more likely result in outlier expression signatures than other types of functional variants. Integration of outlier estimates with genetic variants corroborated previously identified, presumed causal variants and highlighted new candidate in previously un-diagnosed case. Integrative genomic approaches in easily attainable tissue will facilitate the search for biomarkers that impact disease trait, uncover pharmacogenomics targets, provide novel insight into molecular underpinnings of un-characterized conditions, and help improve analytical approaches that use large datasets.
Date Created
2015
Agent