CaltechTHESIS
  A Caltech Library Service

Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing

Citation

Yi, Lynn Donglin (2019) Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/0YE6-2217. https://resolver.caltech.edu/CaltechTHESIS:10102018-143313907

Abstract

RNA-Sequencing ("RNA-Seq") is performed to measure gene expression, often to ask the question of what genes are differentially expressed across various biological conditions. Statistical methods have been used to model RNA-Seq quantifications in order to determine differential expression, and have traditionally be divided into gene-level methods and transcript-level methods. There has been little attempt to connect the statistical divide, although transcript expression and gene expression are biologically inextricably linked. In this thesis, we provide a case study of a comparative differential expression analysis, demonstrating that many differential expression events happen on the isoform-level, and that performing an analysis using only summarized gene quantifications would fail to capture these events. Furthermore, we develop statistical methods that unify the transcript-level and gene-level analysis. In bulk RNA-Seq, by using p-value aggregation methods, we are able to translate transcript-level results into gene-level results under a unified framework. For single cell RNA-Seq, we propose using multiple logistic regression, leveraging the high dimensionality of the data in order to determine if the transcript quantifications pertaining to a gene are able to constitute a linear discriminant for cell type. This method combines differential transcript expression analysis and differential gene expression analysis into a unified framework which we call “gene differential expression.” Lastly, we demonstrate that our methods could be used on transcript compatibility counts instead of transcript quantifications in order to bypass ambiguous read assignment and improve accuracy. We show that transcript compatibility counts obtained via transcriptome pseudoalignment are comparable in quantification accuracy to quantifications from genome alignment methods.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Bioinformatics, computational biology, biostatistics, genomics, gene expression, sequencing, RNA sequencing, differential expression, transcriptome, genome, transcript, pseudoalignment, equivalence class, transcript compatibility counts, kallisto
Degree Grantor:California Institute of Technology
Division:Biology and Biological Engineering
Major Option:Systems Biology
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Pachter, Lior S.
Thesis Committee:
  • Chan, David C. (chair)
  • Thomson, Matthew
  • Pachter, Lior S.
  • Chandrasekaran, Venkat
Defense Date:15 April 2019
Funders:
Funding AgencyGrant Number
NIHGM008042
NIHGM07616
NIHR012017-0569
Lee Ramo FundUNSPECIFIED
Walter and Sylvia Treadway FundUNSPECIFIED
Record Number:CaltechTHESIS:10102018-143313907
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:10102018-143313907
DOI:10.7907/0YE6-2217
Related URLs:
URLURL TypeDescription
https://doi.org/10.1371/journal.pone.0175744DOIManuscript adapted for Chapter II
https://doi.org/10.1186/s13059-018-1419-zDOIManuscript adapted for Chapter III
https://doi.org/10.1038/s41592-018-0303-9DOIManuscript adapted for Chapter IV
https://doi.org/10.1101/444620 DOIManuscript adapted for Chapter V
ORCID:
AuthorORCID
Yi, Lynn Donglin0000-0003-4575-0158
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11226
Collection:CaltechTHESIS
Deposited By: Lynn Yi
Deposited On:23 May 2019 20:12
Last Modified:04 Oct 2019 00:23

Thesis Files

[img]
Preview
PDF (Thesis) - Final Version
See Usage Policy.

8MB
[img]
Preview
PDF (Supplementary Material for Chapter III) - Supplemental Material
See Usage Policy.

14MB
[img]
Preview
PDF (Supplementary Figures for Chapter IV) - Supplemental Material
See Usage Policy.

6MB

Repository Staff Only: item control page