A Caltech Library Service

A Theory of Genetic Analysis Using Transcriptomic Phenotypes


Angeles-Albores, David (2019) A Theory of Genetic Analysis Using Transcriptomic Phenotypes. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/JRNS-NS05.


This thesis deals with the conceptual and computational framework required to use transcriptomes as effective phenotypes for genetic analysis. I demonstrate that there are powerful theoretical reasons why Batesonian epistasis should feature prominently in transcriptional phenotypes. I also show how to compute and interpret the aggregate statistics for transcriptome-wide epistasis and transcriptome-wide dominance using whole-organism transcriptomic profiles of C. elegans mutants. Finally, I developed the WormBase Enrichment Suite for enrichment analysis of genomic data.

RNA-seq as a tool has enormous potential because it relies on protocols that are fast, simple and increasingly cheap. In spite of their potential, transcriptomes have seen their use largely limited to single-factor experiments. Even when many transcriptomes are collected, the main analytic approach is to apply clustering algorithms that correlate responses but do not have any power to identify causal mechanisms.

I demonstrate that if a complete genetic experimental design is used (in the form of a full two-factor matrix), transcriptomes can establish genetic interactions between a pair of genes without the need for clustering algorithms. Surprisingly, when we performed epistasis analyses of hypoxia pathway mutants in C. elegans we did not simply observe a generalized epistatic interaction between the mutants. In fact, the transcriptomes recapitulated the same Batesonian epistatic relationship that had been observed using classical phenotypes. In other words, we observed that the transcriptomic phenotype of one gene can be masked by the transcriptomic phenotype of a second gene, such that a double mutant of these two genes has exactly the same phenotype as a single mutant of the epistatic gene. Motivated by this observation, we developed methods to recognize and interpret Batesonian epistasis at the transcriptomic level. This method relies on the calculation of a single aggregate coefficient that we named the transcriptome-wide epistasis coefficient.

The observation that Batesonian epistasis could be reproduced on a transcriptomic level was surprising. To explain how transcriptome-wide epistasis can arise, I studied a simplified model of transcriptional regulation using statistical mechanics. These studies demonstrate that epistatic analysis is equivalent to a perturbative analysis of the partition function of a promoter. Moreover, these studies revealed that a sufficient condition for Batesonian epistasis to occur is if the two genes encode variables that are transformed and multiplied together to form an effective single compound variable. Finally, these studies clearly demonstrate the connection between statistical (or generalized) epistasis and Batesonian epistasis and establish a physical basis for genetic logic.

Genetic analyses of gene functional units can also be carried out using allelic series in tandem with complementation (also known as dominance) tests. I developed a statistical coefficient known as transcriptome-wide dominance to enable analyses of allelic series using expression profiles. A crucial aspect of allelic series is the ability to enumerate the independent phenotypes associated with an arbitrary set of alleles. I developed the concept of phenotypic classes as a transcriptomic analogue of classical phenotypes for this purpose. Briefly, a phenotypic class is a set of transcripts that are differentially expressed in a specific set of genotypes. Thus, an allelic series consisting of two mutant alleles (and a wild-type) can at most result in 7 phenotypic classes. However, some of these phenotypic classes may be artifactual as a result of the significant false positive and false negative rates that are associated with RNA-seq. I developed a simple algorithm that tries to identify phenotypic classes that are artifactual, though often these classes may also be identified through a critical evaluation of their biological implications. I applied these concepts to a small allelic series of the dpy-22 gene, which encodes a Mediator subunit in C. elegans, and identified 3–4 functional units along with their sequence requirements.

Finally, I developed the WormBase Enrichment Suite by implementing a hypergeometric test on the tissue, gene and phenotype ontology for C. elegans. The importance of this tool derives mainly from its integration to WormBase, the repository of all C. elegans knowledge, which means that the databases that are tested will undergo continuous improvement and curation, and thus will yield the most accurate results.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Epistasis; Transcriptome; RNA-seq; C. elegans; Statistical Mechanics; Dominance; Complementation; Hypoxia; Mediator; Female-State; fog-2; hif-1; egl-9; dpy-22
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Biochemistry and Molecular Biophysics
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Sternberg, Paul W.
Thesis Committee:
  • Newman, Dianne K. (chair)
  • Meyerowitz, Elliot M.
  • Thomson, Matthew
  • Sternberg, Paul W.
Defense Date:18 September 2018
Non-Caltech Author Email:davidaalbores (AT)
Record Number:CaltechTHESIS:10232018-150005837
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for Ch. 3 adapted for Ch. 5 adapted for Ch. 4 adapted for Ch. 6 adapted for Ch. 7 ItemWormBase Enrichment Suite
Angeles-Albores, David0000-0001-5497-8264
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11243
Deposited By: David Angeles Albores
Deposited On:01 Nov 2018 18:32
Last Modified:06 Feb 2019 16:38

Thesis Files

PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page