CaltechTHESIS
  A Caltech Library Service

Data-Driven Protein Engineering

Citation

Wu, Zachary (2021) Data-Driven Protein Engineering. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/nx3c-qb44. https://resolver.caltech.edu/CaltechTHESIS:01042021-220032574

Abstract

Directed evolution has enabled the adaptation of natural protein sequences for an endless variety of human applications. Given a starting point - a sequence with measurable activity - directed evolution is able to improve protein sequences by iteratively accumulating beneficial mutations. However, directed evolution requires investing large experimental effort, which continues to be the major bottleneck in efficient protein optimization. To this end, we describe a framework for incorporating machine learning in the directed evolution process to maximize the utility of generated experimental data in Chapter 2. In Chapter 3, we then show that this framework outperforms traditional directed evolution methods on an empirical fitness landscape. However, directed evolution is fundamentally limited by its need for a starting point, or a sequence with measurable activity. To tackle this issue, we test the ability of nascent deep learning techniques for generating short, functional amino acid sequences in Chapter 4. Encouraged by this success, we attempted to generate full length enzymatic sequences for desired substrates without success. However, we were able to apply this deep learning approach to model other aspects of enzymatic protein sequences in Chapter 5. Finally, the field of data-driven protein sequence generation is enjoying a recent surge in interest, and we provide an updated review of protein engineering with machine learning, focusing on recent work in deep generative modeling in Chapter 1.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Protein engineering, machine learning, directed evolution
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Chemical Engineering
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Arnold, Frances Hamilton
Thesis Committee:
  • Tirrell, David A. (chair)
  • Wang, Zhen-Gang
  • Yue, Yisong
  • Arnold, Frances Hamilton
Defense Date:25 June 2020
Non-Caltech Author Email:zacharywu (AT) gmail.com
Funders:
Funding AgencyGrant Number
NSF1937902
NSFGRF2017227007
Record Number:CaltechTHESIS:01042021-220032574
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:01042021-220032574
DOI:10.7907/nx3c-qb44
Related URLs:
URLURL TypeDescription
https://doi.org/10.1073/pnas.1901979116DOIArticle adapted for Chapters 2 and 3.
https://doi.org/10.1021/acssynbio.0c00219DOIArticle adapted for Chapter 4.
https://doi.org/10.1038/s41592-019-0496-6DOIChapter 1 is the successor (update) to this article.
ORCID:
AuthorORCID
Wu, Zachary0000-0003-2429-9812
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14045
Collection:CaltechTHESIS
Deposited By: Zachary Wu
Deposited On:13 Jan 2021 16:53
Last Modified:08 Nov 2023 00:11

Thesis Files

[img] PDF - Final Version
See Usage Policy.

19MB

Repository Staff Only: item control page