A Caltech Library Service

Machine Learning and Data Assimilation for Blending Incomplete Models and Noisy Data


Levine, Matthew Emanuel (2023) Machine Learning and Data Assimilation for Blending Incomplete Models and Noisy Data. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/b82h-ye78.


The prediction and inference of dynamical systems is of widespread interest across scientific and engineering disciplines. Data assimilation (DA) offers a well-established and successful paradigm for blending such models with noisy observational data. However, traditional DA-based inference often fails when available data are insufficiently informative. Chapter 2 copes with this challenge by introducing constraints into Ensemble Kalman Filtering, which is shown to improve forecasting of glucose dynamics in real patient-level clinical data. Chapter 3 addresses this identifiability challenge by instead developing a simplified, reduced-order stochastic model for glucose dynamics that is more easily identified from patient data. Despite these successes, the forecasting performance of the methods are fundamentally limited by the fidelity of the employed model, which is often not fully understood a priori.

Chapter 4 presents a general picture of how noisy, partially-observed time-series data can be used to learn flexible (e.g., neural network-based) corrections to a pre-specified mechanistic model. In Chapter 5, the proposed methodology is then validated in simulated settings for glucose-insulin models. Chapter 6 provides further perspective on learning flexible model corrections, comparing approaches that use i) gradient-based or gradient-free optimization, ii) temporal or time-averaged data, iii) different model parameterizations, iv) deterministic and stochastic corrections, and v) physical conservation laws to constrain inference.

Chapter 7 studies how these perspectives on machine learning and dynamical systems can help us understand the roles of biochemical networks. In particular, it considers protein dimerization networks from the lens of approximation theory and evaluates how the equilibria of these networks can be fine-tuned to perform a variety of biological computations.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Machine Learning, Data Assimilation, Dynamical Systems,
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computing and Mathematical Sciences
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Stuart, Andrew M.
Thesis Committee:
  • Yue, Yisong (chair)
  • Owhadi, Houman
  • Bouman, Katherine L.
  • Stuart, Andrew M.
Defense Date:3 May 2023
Funding AgencyGrant Number
NIHR01 LM012734
NSF Graduate Research FellowshipDGE-1745301
Record Number:CaltechTHESIS:06012023-213052258
Persistent URL:
Related URLs:
URLURL TypeDescription excerpts included in Ch. 2 adapted for Ch. 2 adapted for Ch. 3 adapted for Ch. 5 adapted for Ch. 4
Levine, Matthew Emanuel0000-0002-5627-3169
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:15264
Deposited By: Matthew Levine
Deposited On:02 Jun 2023 15:24
Last Modified:09 Jun 2023 18:50

Thesis Files

[img] PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page