A Caltech Library Service

Advances in Computational Protein Design: Development of More Efficient Search Algorithms and their Application to the Full-Sequence Design of Larger Proteins


Hom, Geoffrey Kai Tong (2005) Advances in Computational Protein Design: Development of More Efficient Search Algorithms and their Application to the Full-Sequence Design of Larger Proteins. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/M4R9-YM51.


Protein design is the art of choosing an amino acid sequence that will fold into a desired structure. Computational protein design aims to quantify and automate this process. In computational protein design, various metrics may be used to calculate an energy score for a sequence with respect to a desired protein structure. An ongoing challenge is to find the lowest-energy sequences from amongst the vast multitude of sequence possibilities. A variety of exact and approximate algorithms may be used in this search.

The work in this thesis focuses on the development and testing of four search algorithms. The first algorithm, HERO, is an exact algorithm, meaning that it will always find the lowest-energy sequence if the algorithm converges. We show that HERO is faster than other exact algorithms and converges on some previously intractable designs. The second algorithm, Vegas, is an approximate algorithm, meaning that it may not find the lowest-energy sequence. We show that, under certain conditions, Vegas finds the lowest-energy sequence in less time than HERO. The third algorithm, Monte Carlo, is an approximate algorithm that had been developed previously. We tested whether Monte Carlo was thorough enough to do a challenging computational design: the full-sequence design of a protein. Monte Carlo didn’t find the lowest-energy sequence, although a similar sequence from Vegas folded into the desired structure. Several biophysical methods suggested that the Monte Carlo sequence should also fold into the desired structure. Nevertheless, the Monte Carlo structure as determined by X-ray crystallography was markedly different from the predicted structure. We attribute this discrepancy to the presence of a high concentration of dioxane in the crystallization conditions. The fourth algorithm, FC_FASTER, is an approximate algorithm for designs of fixed amino acid composition. Such designs may accelerate improvements to the physical model. We show that FC_FASTER finds lower-energy sequences and is faster than our current fixed-composition algorithm.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:fixed composition
Degree Grantor:California Institute of Technology
Major Option:Biochemistry and Molecular Biophysics
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Mayo, Stephen L.
Thesis Committee:
  • Deshaies, Raymond Joseph (chair)
  • Rees, Douglas C.
  • Pierce, Niles A.
  • Mayo, Stephen L.
Defense Date:13 May 2005
Non-Caltech Author Email:geoffhom (AT)
Record Number:CaltechETD:etd-05302005-223153
Persistent URL:
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:2303
Deposited By: Imported from ETD-db
Deposited On:01 Jun 2005
Last Modified:08 May 2020 21:51

Thesis Files

[img] PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page