Citation
Maser, Michael Robert (2022) Machine Learning Methods Inspired by Challenges in Total Synthesis. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/ffjk-g059. https://resolver.caltech.edu/CaltechTHESIS:02022022-183557614
Abstract
Synthetic organic chemists face a dearth of challenges in the efficient construction of functional molecules, particularly bioactive compounds. Predictive approaches offer reductions to research timelines and resource costs and allow chemists to devote their expertise where it is most valuable. Promising machine learning (ML) methods have evolved for uncovering patterns in chemical data that are beyond the grasp of expert humans, but a number of grand challenges in molecular ML remain. First, the learning of chemical structure representations rooted in physical first principles has yet to be robustly demonstrated. Second, the practical task of predicting successful "over-the-arrow" reaction conditions remains elusive. Finally, the demonstration of such solutions in the context of complex synthesis has yet to be realized.
Herein, approaches to these grand challenges are developed and described. Inspiration is derived from the successful synthesis of the anticancer marine natural product ritterazine B. Reaction condition prediction is approached first, where a novel graph neural network architecture is developed under a multilabel classification framework. The resulting model is successfully demonstrated on datasets of four high-value reaction types in modern synthesis. Next, 3D-to-1D representation learning is approached by development of a volumetric neural architecture based on inception networks. Such voxel models are demonstrated for the prediction of expensive quantum mechanical properties from space-filled data alone.
The merging of these approaches for reaction condition optimization and utility in complex settings is discussed and forecasted for future works, which are currently underway.
Item Type: | Thesis (Dissertation (Ph.D.)) | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subject Keywords: | Machine Learning, Total Synthesis, Organic Chemistry | |||||||||||||||||||||
Degree Grantor: | California Institute of Technology | |||||||||||||||||||||
Division: | Chemistry and Chemical Engineering | |||||||||||||||||||||
Major Option: | Chemistry | |||||||||||||||||||||
Thesis Availability: | Restricted to Caltech community only | |||||||||||||||||||||
Research Advisor(s): |
| |||||||||||||||||||||
Thesis Committee: |
| |||||||||||||||||||||
Defense Date: | 26 January 2022 | |||||||||||||||||||||
Record Number: | CaltechTHESIS:02022022-183557614 | |||||||||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:02022022-183557614 | |||||||||||||||||||||
DOI: | 10.7907/ffjk-g059 | |||||||||||||||||||||
Related URLs: |
| |||||||||||||||||||||
ORCID: |
| |||||||||||||||||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||||||||||||||
ID Code: | 14488 | |||||||||||||||||||||
Collection: | CaltechTHESIS | |||||||||||||||||||||
Deposited By: | Michael Maser | |||||||||||||||||||||
Deposited On: | 28 Mar 2022 16:43 | |||||||||||||||||||||
Last Modified: | 08 Nov 2023 00:42 |
Thesis Files
PDF
- Final Version
Restricted to Caltech community only until 27 September 2022. See Usage Policy. 69MB |
Repository Staff Only: item control page