CaltechTHESIS
  A Caltech Library Service

Graph Modeling for Genomics and Epidemiology

Citation

Eldjarn Hjoerleifsson, Kristjan (2023) Graph Modeling for Genomics and Epidemiology. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/s32c-a211. https://resolver.caltech.edu/CaltechTHESIS:02122023-103759689

Abstract

The last decades have seen great leaps made in the development of RNA sequencing technologies, yielding lower cost and greater throughput of experiments, to the point where the scale of the data produced on a daily basis is staggering. While computational hardware is also continuously improving, famously (or perhaps infamously) described by Gordon Moore (Moore, 1965), the rate at which data are produced eclipses advances on the hardware front. Over the last few years, many new methods have been proposed for bridging that ever-widening chasm, more than a few of which harness the latent graphical structure of genomic data to reduce the number of calculations required and pack the data tighter in memory. This body of work continues this development on three different, but related, fronts. Firstly, I present developments that greatly improve upon the efficiency of state-of-the-art methods for the quantification of RNA-seq reads, and describe a method that improves the accuracy of quantification without substantially increasing the computational over- head. Secondly, I introduce a procedure for the discovery of associations between novel gene isoforms and phenotypes, without prior knowledge of those isoforms. Lastly, I present the largest reconstruction of the transmission tree of a viral outbreak to date, modeled from viral genome sequences, contact tracing, and symptom data. I then use the reconstructed transmission tree to assess the efficacy of different vaccination strategies.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Computational Biology, Molecular Epidemiology, RNA-seq, RNA quantification, SARS-CoV-2
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computing and Mathematical Sciences
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Pachter, Lior S.
Thesis Committee:
  • Wold, Barbara J. (chair)
  • Wierman, Adam C.
  • Melsted, Pall
  • Pachter, Lior S.
Defense Date:16 December 2022
Non-Caltech Author Email:kristjan (AT) eldjarn.net
Record Number:CaltechTHESIS:02122023-103759689
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:02122023-103759689
DOI:10.7907/s32c-a211
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2022.12.02.518832DOIArticle adapted for Chapter III
https://doi.org/10.1038/s41587-021-00870-2DOIArticle adapted for Chapters III and IV
https://doi.org/10.1101/2022.12.02.518787DOIArticle adapted for Chapter V
https://doi.org/10.1016/j.cmi.2022.02.012DOIArticle adapted for Chapter VI
ORCID:
AuthorORCID
Eldjarn Hjoerleifsson, Kristjan0000-0002-7851-1818
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:15105
Collection:CaltechTHESIS
Deposited By: Kristjan Eldjarn Hjoerleifsson
Deposited On:17 Feb 2023 17:48
Last Modified:23 May 2023 20:02

Thesis Files

[img] PDF - Final Version
See Usage Policy.

11MB

Repository Staff Only: item control page