Machine Learning and Data Assimilation for Blending Incomplete Models and Noisy Data

Citation

Levine, Matthew Emanuel (2023) Machine Learning and Data Assimilation for Blending Incomplete Models and Noisy Data. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/b82h-ye78. https://resolver.caltech.edu/CaltechTHESIS:06012023-213052258

Abstract

The prediction and inference of dynamical systems is of widespread interest across scientific and engineering disciplines. Data assimilation (DA) offers a well-established and successful paradigm for blending such models with noisy observational data. However, traditional DA-based inference often fails when available data are insufficiently informative. Chapter 2 copes with this challenge by introducing constraints into Ensemble Kalman Filtering, which is shown to improve forecasting of glucose dynamics in real patient-level clinical data. Chapter 3 addresses this identifiability challenge by instead developing a simplified, reduced-order stochastic model for glucose dynamics that is more easily identified from patient data. Despite these successes, the forecasting performance of the methods are fundamentally limited by the fidelity of the employed model, which is often not fully understood a priori.

Chapter 4 presents a general picture of how noisy, partially-observed time-series data can be used to learn flexible (e.g., neural network-based) corrections to a pre-specified mechanistic model. In Chapter 5, the proposed methodology is then validated in simulated settings for glucose-insulin models. Chapter 6 provides further perspective on learning flexible model corrections, comparing approaches that use i) gradient-based or gradient-free optimization, ii) temporal or time-averaged data, iii) different model parameterizations, iv) deterministic and stochastic corrections, and v) physical conservation laws to constrain inference.

Chapter 7 studies how these perspectives on machine learning and dynamical systems can help us understand the roles of biochemical networks. In particular, it considers protein dimerization networks from the lens of approximation theory and evaluates how the equilibria of these networks can be fine-tuned to perform a variety of biological computations.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Machine Learning, Data Assimilation, Dynamical Systems,

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computing and Mathematical Sciences

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Stuart, Andrew M.

Thesis Committee:

Yue, Yisong (chair)
Owhadi, Houman
Bouman, Katherine L.
Stuart, Andrew M.

Defense Date:

3 May 2023

Funders:

Funding Agency	Grant Number
NIH	R01 LM012734
NSF Graduate Research Fellowship	DGE-1745301

Record Number:

CaltechTHESIS:06012023-213052258

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:06012023-213052258

DOI:

10.7907/b82h-ye78

Related URLs:

URL	URL Type	Description
https://arxiv.org/abs/2305.06513	arXiv	Article excerpts included in Ch. 2
https://doi.org/10.1088/1361-6420/ab1c09	DOI	Article adapted for Ch. 2
https://arxiv.org/abs/1910.14193	arXiv	Article adapted for Ch. 3
https://arxiv.org/abs/2304.14300	arXiv	Article adapted for Ch. 5
https://doi.org/10.1090/cams/10	DOI	Article adapted for Ch. 4

ORCID:

Author	ORCID
Levine, Matthew Emanuel	0000-0002-5627-3169

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

15264

Collection:

CaltechTHESIS

Deposited By:

Matthew Levine

Deposited On:

02 Jun 2023 15:24

Last Modified:

09 Jun 2023 18:50

Thesis Files

PDF - Final Version
See Usage Policy.
18MB

Repository Staff Only: item control page