A Caltech Library Service

Exploiting Structure for Scalable and Robust Deep Learning


Zheng, Stephan Tao (2018) Exploiting Structure for Scalable and Robust Deep Learning. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/4S2Y-CY80.


Deep learning has seen great success training deep neural networks for complex prediction problems, such as large-scale image recognition, short-term time-series forecasting, and learning behavioral models for games with simple dynamics. However, neural networks have a number of weaknesses: 1) they are not sample-efficient and 2) they are often not robust against (adversarial) input perturbations. Hence, it is challenging to train neural networks for problems with exponential complexity, such as multi-agent games, complex long-term spatiotemporal dynamics, or noisy high-resolution image data.

This thesis contributes methods to improve the sample efficiency, expressive power, and robustness of neural networks, by exploiting various forms of low-dimensional structure, such as spatiotemporal hierarchy and multi-agent coordination. We show the effectiveness of this approach in multiple learning paradigms: in both the supervised learning (e.g., imitation learning) and reinforcement learning settings.

First, we introduce hierarchical neural networks that model both short-term actions and long-term goals from data, and can learn human-level behavioral models for spatiotemporal multi-agent games, such as basketball, using imitation learning.

Second, in reinforcement learning, we show that behavioral policies with a hierarchical latent structure can efficiently learn forms of multi-agent coordination, which enables a form of structured exploration for faster learning.

Third, we showcase tensor-train recurrent neural networks that can model high-order mutliplicative structure in dynamical systems (e.g., Lorenz dynamics). We show that this model class gives state-of-the-art long-term forecasting performance with very long time horizons for both simulation and real-world traffic and climate data.

Finally, we demonstrate two methods for neural network robustness: 1) stability training, a form of stochastic data augmentation to make neural networks more robust, and 2) neural fingerprinting, a method that detects adversarial examples by validating the network’s behavior in the neighborhood of any given input.

In sum, this thesis takes a step to enable machine learning for the next scale of problem complexity, such as rich spatiotemporal multi-agent games and large-scale robust predictions.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Machine Learning, Deep Learning, Imitation Learning, Reinforcement Learning, Robust Machine Learning, Algorithms, Big Data, Tensor Learning, Spatiotemporal Data, Multi-agent Systems, Hierarchical Models, Long-term Planning, Structured Exploration
Degree Grantor:California Institute of Technology
Division:Physics, Mathematics and Astronomy
Major Option:Physics
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Yue, Yisong
Thesis Committee:
  • Perona, Pietro (chair)
  • Porter, Frank C.
  • Sutskever, Ilya
  • Yue, Yisong
Defense Date:23 April 2018
Non-Caltech Author Email:st.t.zheng (AT)
Funding AgencyGrant Number
The Powell FoundationUNSPECIFIED
Northrop Grumman CorporationUNSPECIFIED
Record Number:CaltechThesis:05252018-092016207
Persistent URL:
Related URLs:
URLURL TypeDescription
http://www.stephanzheng.comAuthorPersonal Website adapted for Ch 2. adapted for Ch 2. adapted for Ch 2. adapted for Ch 3. adapted for Ch 5. adapted for Ch 5. adapted for Ch 5.
Zheng, Stephan Tao0000-0002-7271-1616
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:10936
Deposited By: Stephan Tao Zheng
Deposited On:25 May 2018 18:57
Last Modified:04 Oct 2019 00:21

Thesis Files

PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page