CaltechTHESIS
  A Caltech Library Service

Active Acquisition Methods for Single Cell Genomics

Citation

Chen, Xiaoqiao (2025) Active Acquisition Methods for Single Cell Genomics. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/nsn8-nd79. https://resolver.caltech.edu/CaltechTHESIS:07052024-170119371

Abstract

We introduce two novel computational methodologies, ActiveSVM and Active Cell Inference, aimed at reducing the costs and enhancing the efficiency of single-cell mRNA sequencing and spatial transcriptomics, respectively. ActiveSVM employs an active learning approach to identify minimal yet highly informative gene sets for cell-type classification, physiological state identification, and genetic perturbation responses in single-cell datasets. By focusing on misclassified cells through an iterative process, ActiveSVM efficiently scales to analyze over a million cells, demonstrating around 90% accuracy across various datasets, including cell atlas and disease characterization studies.

Active Cell Inference complements this by utilizing ordered gene sets, developed through ActiveSVM, to streamline spatial genomics measurements. This end-to-end pipeline significantly reduces measurement time and costs by up to 100-fold in scientific and clinical settings. It optimizes the gene probing process by identifying well-classified cells early, allowing for targeted gene application based on cell classification certainty. This method's efficacy is further enhanced by a temporal scaling calibration scheme, improving calibration accuracy throughout its iterative process.

Both methodologies were rigorously tested on the expansive Human Cell Atlas dataset, using the advanced computational tool, CellxGene-Census, involving over 60 million cells. This integration facilitated the creation of precise gene sets for various human tissues, dramatically improving the efficiency and reliability of these cutting-edge genomic techniques. Together, ActiveSVM and Active Cell Inference represent significant advancements in the application of genomics to clinical diagnostics, therapeutic discovery, and genetic screens, promising substantial reductions in the operational complexities and costs associated with next-generation sequencing technologies.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:machine learning; active feature selection; active acquisition; single-cell genomics; spatial genomics;
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computing and Mathematical Sciences
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Thomson, Matthew
Thesis Committee:
  • Cai, Long (chair)
  • Thomson, Matthew
  • Yue, Yisong
  • Bouman, Katherine L.
Defense Date:27 July 2024
Funders:
Funding AgencyGrant Number
Heritage Medical Research InstituteUNSPECIFIED
Record Number:CaltechTHESIS:07052024-170119371
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:07052024-170119371
DOI:10.7907/nsn8-nd79
Related URLs:
URLURL TypeDescription
https://doi.org/10.1038/s43588-022-00263-8DOIArticle adapted for Chapter 2
ORCID:
AuthorORCID
Chen, Xiaoqiao0000-0003-4685-3466
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:16533
Collection:CaltechTHESIS
Deposited By: Xiaoqiao Chen
Deposited On:15 Jul 2024 20:06
Last Modified:25 Jul 2024 16:29

Thesis Files

[img] PDF - Final Version
See Usage Policy.

43MB

Repository Staff Only: item control page