Citation
Wu, Zachary (2021) Data-Driven Protein Engineering. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/nx3c-qb44. https://resolver.caltech.edu/CaltechTHESIS:01042021-220032574
Abstract
Directed evolution has enabled the adaptation of natural protein sequences for an endless variety of human applications. Given a starting point - a sequence with measurable activity - directed evolution is able to improve protein sequences by iteratively accumulating beneficial mutations. However, directed evolution requires investing large experimental effort, which continues to be the major bottleneck in efficient protein optimization. To this end, we describe a framework for incorporating machine learning in the directed evolution process to maximize the utility of generated experimental data in Chapter 2. In Chapter 3, we then show that this framework outperforms traditional directed evolution methods on an empirical fitness landscape. However, directed evolution is fundamentally limited by its need for a starting point, or a sequence with measurable activity. To tackle this issue, we test the ability of nascent deep learning techniques for generating short, functional amino acid sequences in Chapter 4. Encouraged by this success, we attempted to generate full length enzymatic sequences for desired substrates without success. However, we were able to apply this deep learning approach to model other aspects of enzymatic protein sequences in Chapter 5. Finally, the field of data-driven protein sequence generation is enjoying a recent surge in interest, and we provide an updated review of protein engineering with machine learning, focusing on recent work in deep generative modeling in Chapter 1.
Item Type: | Thesis (Dissertation (Ph.D.)) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subject Keywords: | Protein engineering, machine learning, directed evolution | ||||||||||||
Degree Grantor: | California Institute of Technology | ||||||||||||
Division: | Chemistry and Chemical Engineering | ||||||||||||
Major Option: | Chemical Engineering | ||||||||||||
Thesis Availability: | Public (worldwide access) | ||||||||||||
Research Advisor(s): |
| ||||||||||||
Thesis Committee: |
| ||||||||||||
Defense Date: | 25 June 2020 | ||||||||||||
Non-Caltech Author Email: | zacharywu (AT) gmail.com | ||||||||||||
Funders: |
| ||||||||||||
Record Number: | CaltechTHESIS:01042021-220032574 | ||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:01042021-220032574 | ||||||||||||
DOI: | 10.7907/nx3c-qb44 | ||||||||||||
Related URLs: |
| ||||||||||||
ORCID: |
| ||||||||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||
ID Code: | 14045 | ||||||||||||
Collection: | CaltechTHESIS | ||||||||||||
Deposited By: | Zachary Wu | ||||||||||||
Deposited On: | 13 Jan 2021 16:53 | ||||||||||||
Last Modified: | 08 Nov 2023 00:11 |
Thesis Files
PDF
- Final Version
See Usage Policy. 19MB |
Repository Staff Only: item control page