A Caltech Library Service

Complexity of Transcriptomic Data Analysis and Implications for Biological Discovery


Luebbert, Laura (2024) Complexity of Transcriptomic Data Analysis and Implications for Biological Discovery. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/xnw5-v914.


Over the past decade, the advancement of ‘omics’ technologies has ushered in a new era for the life sciences. Given the high-throughput nature of omics technologies, this era is characterized by unique computational challenges pertaining to data size and dimensionality, and technical and biological noise. Concurrently, it offers opportunities, as global, untargeted, and parallel measurement of large amounts of information often captures unexpected insights.

This thesis describes challenges inherent to the omics era of life sciences, particularly highlighting the increasing importance of merging expertise in biology and computer science. It describes the development of multiple software tools designed to address several of these challenges, which were immediately adopted and widely implemented in transcriptomics and proteomics research. Additionally, it contains three chapters focused on unraveling previously unquantifiable information, including the interpretation of sequencing data from organisms with low-quality reference genome assemblies and workflows for identifying novel viruses using single-cell RNA sequencing data already massively generated in research, healthcare, and agriculture.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Biology;Computational Biology;Bioinformatics;Sequencing;RNA sequencing;Single-cell RNA sequencing;Virus;Viruses;Virus detection;kallisto;gget
Degree Grantor:California Institute of Technology
Division:Biology and Biological Engineering
Major Option:Biology
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Pachter, Lior S.
Thesis Committee:
  • Van Valen, David A. (chair)
  • Murray, Richard M.
  • Bjorkman, Pamela J.
  • Pachter, Lior S.
Defense Date:14 March 2024
Funding AgencyGrant Number
Caltech Chen Graduate Innovator GrantCHEN.SYS3.CGIAFY21
National Institutes of Health (NIH)U19MH114830
Caltech Biology and Bioengineering DivisionUNSPECIFIED
Record Number:CaltechTHESIS:05042024-011724418
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for chapter 3 - bioRxiv: Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression. adapted for chapter 2 - Bioinformatics: Fast and scalable querying of eukaryotic linear motifs with gget elm. adapted for chapter 2 - Bioinformatics: Efficient querying of genomic reference databases with gget. adapted for chapter 4 - bioRxiv: Recovery of a learned behavior despite partial restoration of neuronal dynamics after chronic inactivation of inhibitory neurons. adapted for chapter 5 – Nature Medicine: PSCA-CAR T cell therapy in metastatic castration-resistant prostate cancer: a phase 1 trial
Luebbert, Laura0000-0003-1379-2927
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:16368
Deposited By: Laura Luebbert
Deposited On:14 May 2024 18:24
Last Modified:17 Jun 2024 16:19

Thesis Files

[img] PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page