A Caltech Library Service

Machine Learning Methods Inspired by Challenges in Total Synthesis


Maser, Michael Robert (2022) Machine Learning Methods Inspired by Challenges in Total Synthesis. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/ffjk-g059.


Synthetic organic chemists face a dearth of challenges in the efficient construction of functional molecules, particularly bioactive compounds. Predictive approaches offer reductions to research timelines and resource costs and allow chemists to devote their expertise where it is most valuable. Promising machine learning (ML) methods have evolved for uncovering patterns in chemical data that are beyond the grasp of expert humans, but a number of grand challenges in molecular ML remain. First, the learning of chemical structure representations rooted in physical first principles has yet to be robustly demonstrated. Second, the practical task of predicting successful "over-the-arrow" reaction conditions remains elusive. Finally, the demonstration of such solutions in the context of complex synthesis has yet to be realized.

Herein, approaches to these grand challenges are developed and described. Inspiration is derived from the successful synthesis of the anticancer marine natural product ritterazine B. Reaction condition prediction is approached first, where a novel graph neural network architecture is developed under a multilabel classification framework. The resulting model is successfully demonstrated on datasets of four high-value reaction types in modern synthesis. Next, 3D-to-1D representation learning is approached by development of a volumetric neural architecture based on inception networks. Such voxel models are demonstrated for the prediction of expensive quantum mechanical properties from space-filled data alone.

The merging of these approaches for reaction condition optimization and utility in complex settings is discussed and forecasted for future works, which are currently underway.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Machine Learning, Total Synthesis, Organic Chemistry
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Chemistry
Thesis Availability:Restricted to Caltech community only
Research Advisor(s):
  • Reisman, Sarah E.
Thesis Committee:
  • Stoltz, Brian M. (chair)
  • Reisman, Sarah E.
  • Fu, Gregory C.
  • Yue, Yisong
  • Listgarten, Jennifer
Defense Date:26 January 2022
Record Number:CaltechTHESIS:02022022-183557614
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for Chapter 1 adapted for Chapter 2 adapted for Chapter 3 adapted for Chapter 3 adapted for Appendix 4 adapted for Chapter 4
Maser, Michael Robert0000-0001-7895-7804
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14488
Deposited By: Michael Maser
Deposited On:28 Mar 2022 16:43
Last Modified:08 Nov 2023 00:42

Thesis Files

[img] PDF - Final Version
Restricted to Caltech community only until 27 September 2022.
See Usage Policy.


Repository Staff Only: item control page