A Caltech Library Service

Accurate and Transferable Molecular-Orbital-Based Machine Learning for Molecular Modeling


Cheng, Lixue (2022) Accurate and Transferable Molecular-Orbital-Based Machine Learning for Molecular Modeling. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/cjak-4x38.


Quantum simulation is a powerful tool for chemists to understand the chemical processes and discover their nature accurately by expensive wavefunction theory or approximately by cheap density function theory (DFT)\nomenclature{DFT}{Density Functional Theory}. However, the cost-accuracy trade-offs in electronic structure methods limit the application of quantum simulation to large chemical and biological systems. In this thesis, an accurate, transferable, and physical-driven molecular modelling framework, i.e., molecular-orbital-based machine learning (MOB-ML), is introduced to provide accurate wavefunction-quality molecular descriptions with at most mean-field level computational cost. Instead of directly predicting the total molecular energies, MOB-ML describes the post-Hartree-Fock correlation energy from molecular orbital information at the cost of Hartree-Fock computations. Preserving all the physical constraints, molecular orbital based (MOB) features represent the chemical space faithfully in both supervised clustering and unsupervised learning for chemical space explorations. The development of local regressions with scalable exact Gaussian processes within clusters further allows MOB-ML to provide the most accurate approach in both low and big data regimes. As exciting and general new tool to tackle various problems in chemistry, MOB-ML offers great accuracies of predicting total energies and serves as a universal density functional for organic molecules and non-covalent interactions in various chemical systems. With the availability of analytical nuclear gradients, MOB-ML is also capable of generating accurate PESs with few reference high-level electronic structure computations in the diffusion Monte Carlo accurately and efficiently for computational spectroscopy.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Electronic structure, Machine Learning, Quantum Simulations
Degree Grantor:California Institute of Technology
Division:Chemistry and Chemical Engineering
Major Option:Chemistry
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Miller, Thomas F.
Thesis Committee:
  • Chan, Garnet K. (chair)
  • Anandkumar, Anima
  • Goddard, William A., III
  • Miller, Thomas F.
  • Pierce, Niles A.
Defense Date:29 March 2022
Non-Caltech Author Email:sherrylixuecheng (AT)
Funding AgencyGrant Number
Army Research Office (ARO)W911NF-12-2-0023
DeLogi Science and Technology GrantUNSPECIFIED
Record Number:CaltechThesis:04012022-153013173
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for Chapter 2. adapted for Chapter 2. adapted for Chapter 3.
Cheng, Lixue0000-0002-7329-0585
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14538
Deposited By: Lixue Cheng
Deposited On:29 Apr 2022 15:10
Last Modified:15 Jun 2022 19:25

Thesis Files

[img] PDF (Complete Thesis) - Final Version
See Usage Policy.


Repository Staff Only: item control page