CaltechTHESIS
  A Caltech Library Service

Machine Learning Methods Inspired by Challenges in Total Synthesis

Citation

Maser, Michael Robert (2022) Machine Learning Methods Inspired by Challenges in Total Synthesis. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/ffjk-g059. https://resolver.caltech.edu/CaltechTHESIS:02022022-183557614

Abstract

Synthetic organic chemists face a dearth of challenges in the efficient construction of functional molecules, particularly bioactive compounds. Predictive approaches offer reductions to research timelines and resource costs and allow chemists to devote their expertise where it is most valuable. Promising machine learning (ML) methods have evolved for uncovering patterns in chemical data that are beyond the grasp of expert humans, but a number of grand challenges in molecular ML remain. First, the learning of chemical structure representations rooted in physical first principles has yet to be robustly demonstrated. Second, the practical task of predicting successful "over-the-arrow" reaction conditions remains elusive. Finally, the demonstration of such solutions in the context of complex synthesis has yet to be realized.

Herein, approaches to these grand challenges are developed and described. Inspiration is derived from the successful synthesis of the anticancer marine natural product ritterazine B. Reaction condition prediction is approached first, where a novel graph neural network architecture is developed under a multilabel classification framework. The resulting model is successfully demonstrated on datasets of four high-value reaction types in modern synthesis. Next, 3D-to-1D representation learning is approached by development of a volumetric neural architecture based on inception networks. Such voxel models are demonstrated for the prediction of expensive quantum mechanical properties from space-filled data alone.

The merging of these approaches for reaction condition optimization and utility in complex settings is discussed and forecasted for future works, which are currently underway.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Machine Learning, Total Synthesis, Organic Chemistry
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Chemistry
Thesis Availability:Restricted to Caltech community only
Research Advisor(s):
  • Reisman, Sarah E.
Thesis Committee:
  • Stoltz, Brian M. (chair)
  • Reisman, Sarah E.
  • Fu, Gregory C.
  • Yue, Yisong
  • Listgarten, Jennifer
Defense Date:26 January 2022
Record Number:CaltechTHESIS:02022022-183557614
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:02022022-183557614
DOI:10.7907/ffjk-g059
Related URLs:
URLURL TypeDescription
https://doi.org/10.1021/jacs.9b13818DOIArticle adapted for Chapter 1
https://doi.org/10.1021/jacs.1c01372DOIArticle adapted for Chapter 2
https://arxiv.org/abs/2007.04275arXivArticle adapted for Chapter 3
https://doi.org/10.1021/acs.jcim.0c01234DOIArticle adapted for Chapter 3
https://doi.org/10.1021/jacs.1c09820DOIArticle adapted for Appendix 4
https://chemrxiv.org/engage/chemrxiv/article-details/60ea947a9ab06e2e274d6cd7arXivArticle adapted for Chapter 4
ORCID:
AuthorORCID
Maser, Michael Robert0000-0001-7895-7804
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14488
Collection:CaltechTHESIS
Deposited By: Michael Maser
Deposited On:28 Mar 2022 16:43
Last Modified:08 Nov 2023 00:42

Thesis Files

[img] PDF - Final Version
Restricted to Caltech community only until 27 September 2022.
See Usage Policy.

69MB

Repository Staff Only: item control page