Statistical Foundations of Operator Learning


Nelsen, Nicholas Hao (2024) Statistical Foundations of Operator Learning. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/0246-7574.


This thesis studies operator learning from a statistical perspective. Operator learning uses observed data to estimate mappings between infinite-dimensional spaces. It does so at the conceptually continuum level, leading to discretization-independent machine learning methods when implemented in practice. Although this framework shows promise for physical model acceleration and discovery, the mathematical theory of operator learning lags behind its empirical success. Motivated by scientific computing and inverse problems where the available data are often scarce, this thesis develops scalable algorithms for operator learning and theoretical insights into their data efficiency.

The thesis begins by introducing a convergent operator learning algorithm that is implementable on a computer with controlled complexity. The method is based on linear combinations of function-valued random features, enjoys efficient training via convex optimization, and accurately approximates nonlinear solution operators of parametric partial differential equations. A statistical analysis derives state-of-the-art error bounds for the method and establishes its robustness to errors stemming from noisy observations and model misspecification. Next, the thesis tackles fundamental statistical questions about how problem structure, data quality, and prior information influence learning accuracy. Specializing to a linear setting, a sharp Bayesian nonparametric analysis shows that continuum linear operators, such as the integration or differentiation of spatially varying functions, are provably learnable from noisy input-output pairs. The theory reveals that smoothing operators are easier to learn than unbounded ones and that training with rough or high-frequency input data improves sample complexity. When only specific linear functionals of the operator’s output are the primary quantities of interest, the final part of the thesis proves that the smoothness of the functionals determines whether learning directly from these finite-dimensional observations carries a statistical advantage over plug-in estimators based on learning the entire operator. To validate the findings beyond linear problems, the thesis develops practical deep operator learning architectures for nonlinear mappings that send functions to vectors, or vice versa, and shows their corresponding universal approximation properties. Altogether, this thesis advances the reliability and efficiency of operator learning for continuum problems in the physical and data sciences.

Subject Keywords:scientific machine learning; learning theory; numerical analysis; inverse problems; functional data analysis; Bayesian nonparametric statistics; sample complexity; random features; ridge regression; parameter-to-observable maps
Awards:The W.P. Carey and Co. Prize in Applied Mathematics, 2024. Centennial Prize for the Best Thesis in Mechanical and Civil Engineering, 2024. SIAM Review SIGEST Award (Ch. 2), 2024. NeurIPS Spotlight (Ch. 3), 2023.
