Citation
Yi, Lynn Donglin (2019) Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/0YE6-2217. https://resolver.caltech.edu/CaltechTHESIS:10102018-143313907
Abstract
RNA-Sequencing ("RNA-Seq") is performed to measure gene expression, often to ask the question of what genes are differentially expressed across various biological conditions. Statistical methods have been used to model RNA-Seq quantifications in order to determine differential expression, and have traditionally be divided into gene-level methods and transcript-level methods. There has been little attempt to connect the statistical divide, although transcript expression and gene expression are biologically inextricably linked. In this thesis, we provide a case study of a comparative differential expression analysis, demonstrating that many differential expression events happen on the isoform-level, and that performing an analysis using only summarized gene quantifications would fail to capture these events. Furthermore, we develop statistical methods that unify the transcript-level and gene-level analysis. In bulk RNA-Seq, by using p-value aggregation methods, we are able to translate transcript-level results into gene-level results under a unified framework. For single cell RNA-Seq, we propose using multiple logistic regression, leveraging the high dimensionality of the data in order to determine if the transcript quantifications pertaining to a gene are able to constitute a linear discriminant for cell type. This method combines differential transcript expression analysis and differential gene expression analysis into a unified framework which we call “gene differential expression.” Lastly, we demonstrate that our methods could be used on transcript compatibility counts instead of transcript quantifications in order to bypass ambiguous read assignment and improve accuracy. We show that transcript compatibility counts obtained via transcriptome pseudoalignment are comparable in quantification accuracy to quantifications from genome alignment methods.
Item Type: | Thesis (Dissertation (Ph.D.)) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subject Keywords: | Bioinformatics, computational biology, biostatistics, genomics, gene expression, sequencing, RNA sequencing, differential expression, transcriptome, genome, transcript, pseudoalignment, equivalence class, transcript compatibility counts, kallisto | |||||||||||||||
Degree Grantor: | California Institute of Technology | |||||||||||||||
Division: | Biology and Biological Engineering | |||||||||||||||
Major Option: | Systems Biology | |||||||||||||||
Thesis Availability: | Public (worldwide access) | |||||||||||||||
Research Advisor(s): |
| |||||||||||||||
Thesis Committee: |
| |||||||||||||||
Defense Date: | 15 April 2019 | |||||||||||||||
Funders: |
| |||||||||||||||
Record Number: | CaltechTHESIS:10102018-143313907 | |||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:10102018-143313907 | |||||||||||||||
DOI: | 10.7907/0YE6-2217 | |||||||||||||||
Related URLs: |
| |||||||||||||||
ORCID: |
| |||||||||||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||||||||
ID Code: | 11226 | |||||||||||||||
Collection: | CaltechTHESIS | |||||||||||||||
Deposited By: | Lynn Yi | |||||||||||||||
Deposited On: | 23 May 2019 20:12 | |||||||||||||||
Last Modified: | 04 Oct 2019 00:23 |
Thesis Files
|
PDF (Thesis)
- Final Version
See Usage Policy. 8MB | |
|
PDF (Supplementary Material for Chapter III)
- Supplemental Material
See Usage Policy. 14MB | |
|
PDF (Supplementary Figures for Chapter IV)
- Supplemental Material
See Usage Policy. 6MB |
Repository Staff Only: item control page