A Caltech Library Service

Enhanced Algorithms for Analysis and Design of Nucleic Acid Reaction Pathways


Porubsky, Nicholas James (2020) Enhanced Algorithms for Analysis and Design of Nucleic Acid Reaction Pathways. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/M8CZ-MW98.


Nucleic acids provide a powerful platform for programming at the molecular level. This is possible because the free energy of nucleic acid structures is dominated by the local interactions of base pairing and base pair stacking. The nearest neighbor secondary structure model implied by these energetics has enabled development of a set of algorithms for calculating thermodynamic quantities of nucleic acid sequences. Molecular programmers and synthetic biologists continue to extend their reach to larger, more complicated nucleic acid complexes, reaction pathways, and systems. This necessitates a focus on new algorithm development and efficient implementations to enable analysis and design of such systems.

Concerning analysis of nucleic acids, we collect seemingly diverse algorithms under a unified three-component dynamic programming framework consisting of: 1) recursions that specify the dependencies between subproblems and incorporate the details of the structural ensemble and the free energy model, 2) evaluation algebras that define the mathematical form of each subproblem, 3) operation orders that specify the computational trajectory through the dependency graph of subproblems. Changes to the set of recursions allows operation over the complex ensemble including coaxial and dangle stacking states, affecting all thermodynamic quantities. An updated operation order for structure sampling allows simultaneous generation of a set of structures sampled from the Boltzmann distribution in time that scales empirically sublinearly in the number of samples and leads to an order of magnitude or more speedup over repeated single-structure sampling.

For the problem of sequence design for reaction pathway engineering, we introduce an optimization algorithm to minimize the multitstate test tube ensemble defect, which simultaneously designs for reactant, intermediate, and product states along the reaction pathway (positive design) and against crosstalk interactions (negative design). Each of these on-pathway or crosstalk states is represented as a target test tube ensemble containing arbitrary numbers of on-target complexes, each with a target secondary structure and target concentration, and arbitrary numbers of off-target complexes, each with vanishing target concentration. Our test tube specification formalism enables conversion of a reaction pathway specification into a set of target test tubes. Sequences are designed subject to a set of hard constraints allowing specification of properties such as sequence composition, sequence complementarity, prevention of unwanted sequence patterns, and inclusion of biological sequences. We then extend this algorithm with soft constraints, enhancing flexibility through new constraint types and reducing design cost by up to two orders of magnitude in the most highly constrained cases. These soft constraints enable multiobjective design of the multitstate test tube ensemble defect simultaneously with heuristics for avoiding kinetic traps and equalizing reaction rates to further aid reaction pathway engineering.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:nucleic acids; DNA; RNA; test tube; reaction pathway; secondary structure; multistate test tube design; coaxial and dangle stacking;
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Chemical Engineering
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Pierce, Niles A.
Thesis Committee:
  • Winfree, Erik (chair)
  • Wang, Zhen-Gang
  • Ismagilov, Rustem F.
  • Pierce, Niles A.
Defense Date:16 September 2019
Non-Caltech Author Email:njpcaltech (AT)
Funding AgencyGrant Number
National Science FoundationSoftware Elements NSF-OAC-1835414
National Science FoundationINSPIRE NSF-CHE-1643606
National Science FoundationMolecular Programming Project NSF-CCF-1317694
National Institutes of HealthNational Research Service Award T32 GM007616
Programmable Molecular Technology Center (PMTC) within the Beckman Institute at CaltechUNSPECIFIED
AWS/IST Cloud Credit Program at CaltechUNSPECIFIED
Microsoft Azure sponsorshipUNSPECIFIED
Record Number:CaltechTHESIS:10062019-213347699
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for ch. 3
Porubsky, Nicholas James0000-0001-6330-2645
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11808
Deposited By: Nicholas Porubsky
Deposited On:25 Oct 2019 00:26
Last Modified:08 Nov 2023 00:43

Thesis Files

PDF (Final thesis PDF) - Final Version
See Usage Policy.


Repository Staff Only: item control page