CaltechTHESIS
  A Caltech Library Service

Complexity of Transcriptomic Data Analysis and Implications for Biological Discovery

Citation

Luebbert, Laura (2024) Complexity of Transcriptomic Data Analysis and Implications for Biological Discovery. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/xnw5-v914. https://resolver.caltech.edu/CaltechTHESIS:05042024-011724418

Abstract

Over the past decade, the advancement of ‘omics’ technologies has ushered in a new era for the life sciences. Given the high-throughput nature of omics technologies, this era is characterized by unique computational challenges pertaining to data size and dimensionality, and technical and biological noise. Concurrently, it offers opportunities, as global, untargeted, and parallel measurement of large amounts of information often captures unexpected insights.

This thesis describes challenges inherent to the omics era of life sciences, particularly highlighting the increasing importance of merging expertise in biology and computer science. It describes the development of multiple software tools designed to address several of these challenges, which were immediately adopted and widely implemented in transcriptomics and proteomics research. Additionally, it contains three chapters focused on unraveling previously unquantifiable information, including the interpretation of sequencing data from organisms with low-quality reference genome assemblies and workflows for identifying novel viruses using single-cell RNA sequencing data already massively generated in research, healthcare, and agriculture.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Biology;Computational Biology;Bioinformatics;Sequencing;RNA sequencing;Single-cell RNA sequencing;Virus;Viruses;Virus detection;kallisto;gget
Degree Grantor:California Institute of Technology
Division:Biology and Biological Engineering
Major Option:Biology
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Pachter, Lior S.
Thesis Committee:
  • Van Valen, David A. (chair)
  • Murray, Richard M.
  • Bjorkman, Pamela J.
  • Pachter, Lior S.
Defense Date:14 March 2024
Funders:
Funding AgencyGrant Number
Caltech Chen Graduate Innovator GrantCHEN.SYS3.CGIAFY21
National Institutes of Health (NIH)U19MH114830
Caltech Biology and Bioengineering DivisionUNSPECIFIED
Record Number:CaltechTHESIS:05042024-011724418
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:05042024-011724418
DOI:10.7907/xnw5-v914
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2023.12.11.571168DOIArticle adapted for chapter 3 - bioRxiv: Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression.
https://doi.org/10.1093/bioinformatics/btae095DOIArticle adapted for chapter 2 - Bioinformatics: Fast and scalable querying of eukaryotic linear motifs with gget elm.
https://doi.org/10.1093/bioinformatics/btac836DOIArticle adapted for chapter 2 - Bioinformatics: Efficient querying of genomic reference databases with gget.
https://doi.org/10.1101/2023.05.17.541057DOIArticle adapted for chapter 4 - bioRxiv: Recovery of a learned behavior despite partial restoration of neuronal dynamics after chronic inactivation of inhibitory neurons.
https://doi.org/10.1038/s41591-024-02979-8DOIArticle adapted for chapter 5 – Nature Medicine: PSCA-CAR T cell therapy in metastatic castration-resistant prostate cancer: a phase 1 trial
ORCID:
AuthorORCID
Luebbert, Laura0000-0003-1379-2927
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:16368
Collection:CaltechTHESIS
Deposited By: Laura Luebbert
Deposited On:14 May 2024 18:24
Last Modified:17 Jun 2024 16:19

Thesis Files

[img] PDF - Final Version
See Usage Policy.

36MB

Repository Staff Only: item control page