A Caltech Library Service

Regret-Optimal Control


Goel, Gautam (2022) Regret-Optimal Control. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/8t6d-4t53.


Optimal controllers are usually designed to minimize cost under the assumption that the disturbance they encounter is drawn from some specific class. For example, in H₂ control the disturbance is assumed to be generated by a stochastic process and the controller is designed to minimize its expected cost, while in H control the disturbance is assumed to be generated adversarially and the controller is designed to minimize its worst-case cost. This approach suffers from an obvious drawback: a controller which encounters a disturbance which falls outside of the class it was designed to handle may perform poorly. This observation naturally motivates the design of adaptive controllers which dynamically adjust their control strategy as they causally observe the disturbance instead of blindly following a prescribed strategy.

Inspired by online learning, we propose data-dependent regret as a criterion for controller design. In the regret-optimal control paradigm, causal controllers are designed to minimize regret against a hypothetical optimal noncausal controller, which selects the cost-minimizing sequence of control actions given noncausal access to the disturbance sequence. Controllers with low regret retain a performance guarantee irrespective of how the disturbance is generated; it is this universality which makes our approach an attractive alternative to traditional H₂ and H control. The regret of the causal controller is bounded by some measure of the complexity of the disturbance sequence; we consider several different complexity measures, including the energy of the disturbance sequence, which measures the size of the disturbance, and the pathlength of the disturbance, which measures its variation over time. We also consider the alternative metric of competitive ratio, which is the worst-case ratio between the cost incurred by the causal controller and the cost incurred by the optimal noncausal controller. This metric can also be viewed as a special case of data-dependent regret, where the complexity measure is simply the offline optimal cost. For each of these complexity measures, we derive a corresponding control algorithm with optimal data-dependent regret. The key technique we introduce is an operator-theoretic reduction from regret-optimal control to H control; each of the regret-optimal controllers we obtain can be interpreted as an H controller in a synthetic system of larger dimension. We also extend regret-optimal control to the more challenging measurement-feedback setting, where the online controller must choose control actions without directly observing the disturbance sequence, using only noisy linear measurements of the state.

We show that the competitive controller can be arbitrarily well-approximated by the class of disturbance-action-controller (DAC) policies. The convexity of this class of policies makes it amenable to online optimization via a reduction to online convex optimization with memory, and this class has hence attracted much recent attention in online learning. Using our approximation result, we show how to obtain algorithms which achieve the "best-of-both-worlds": sublinear policy regret against DAC policies and approximate competitive ratio. These performance guarantees can even be extended to the "adaptive control" setting, where the controller does not know the system dynamics ahead of time and must perform online system identification.

We present numerical experiments in a linear dynamical system which demonstrate how the performance of regret-optimal controllers varies as a function of the complexity of the disturbance. We extend regret-optimal control to nonlinear dynamical systems using model-predictive control (MPC) and present experiments which suggest that regret-optimal control is a promising approach to adapting to model error in nonlinear control.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:online learning, adaptive control, regret minimization, robust control
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computing and Mathematical Sciences
Awards:Bhansali Family Doctoral Prize in Computer Science, 2022.
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Hassibi, Babak
Thesis Committee:
  • Yue, Yisong (chair)
  • Wierman, Adam C.
  • Hazan, Elad
  • Hassibi, Babak
Defense Date:26 May 2022
Record Number:CaltechTHESIS:06062022-021936716
Persistent URL:
Related URLs:
URLURL TypeDescription for Chapter 3 for Chapter 2 for Chapter 3 for Chapter 3
Goel, Gautam000-0002-7054-7218
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14947
Deposited By: Gautam Goel
Deposited On:06 Jun 2022 22:35
Last Modified:04 Aug 2022 19:06

Thesis Files

[img] PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page