Citation
Chen, Xiaoqiao (2025) Active Acquisition Methods for Single Cell Genomics. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/nsn8-nd79. https://resolver.caltech.edu/CaltechTHESIS:07052024-170119371
Abstract
We introduce two novel computational methodologies, ActiveSVM and Active Cell Inference, aimed at reducing the costs and enhancing the efficiency of single-cell mRNA sequencing and spatial transcriptomics, respectively. ActiveSVM employs an active learning approach to identify minimal yet highly informative gene sets for cell-type classification, physiological state identification, and genetic perturbation responses in single-cell datasets. By focusing on misclassified cells through an iterative process, ActiveSVM efficiently scales to analyze over a million cells, demonstrating around 90% accuracy across various datasets, including cell atlas and disease characterization studies.
Active Cell Inference complements this by utilizing ordered gene sets, developed through ActiveSVM, to streamline spatial genomics measurements. This end-to-end pipeline significantly reduces measurement time and costs by up to 100-fold in scientific and clinical settings. It optimizes the gene probing process by identifying well-classified cells early, allowing for targeted gene application based on cell classification certainty. This method's efficacy is further enhanced by a temporal scaling calibration scheme, improving calibration accuracy throughout its iterative process.
Both methodologies were rigorously tested on the expansive Human Cell Atlas dataset, using the advanced computational tool, CellxGene-Census, involving over 60 million cells. This integration facilitated the creation of precise gene sets for various human tissues, dramatically improving the efficiency and reliability of these cutting-edge genomic techniques. Together, ActiveSVM and Active Cell Inference represent significant advancements in the application of genomics to clinical diagnostics, therapeutic discovery, and genetic screens, promising substantial reductions in the operational complexities and costs associated with next-generation sequencing technologies.
Item Type: | Thesis (Dissertation (Ph.D.)) | ||||||
---|---|---|---|---|---|---|---|
Subject Keywords: | machine learning; active feature selection; active acquisition; single-cell genomics; spatial genomics; | ||||||
Degree Grantor: | California Institute of Technology | ||||||
Division: | Engineering and Applied Science | ||||||
Major Option: | Computing and Mathematical Sciences | ||||||
Thesis Availability: | Public (worldwide access) | ||||||
Research Advisor(s): |
| ||||||
Thesis Committee: |
| ||||||
Defense Date: | 27 July 2024 | ||||||
Funders: |
| ||||||
Record Number: | CaltechTHESIS:07052024-170119371 | ||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:07052024-170119371 | ||||||
DOI: | 10.7907/nsn8-nd79 | ||||||
Related URLs: |
| ||||||
ORCID: |
| ||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||
ID Code: | 16533 | ||||||
Collection: | CaltechTHESIS | ||||||
Deposited By: | Xiaoqiao Chen | ||||||
Deposited On: | 15 Jul 2024 20:06 | ||||||
Last Modified: | 25 Jul 2024 16:29 |
Thesis Files
PDF
- Final Version
See Usage Policy. 43MB |
Repository Staff Only: item control page