Regret-Optimal Control

Citation

Goel, Gautam (2022) Regret-Optimal Control. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/8t6d-4t53. https://resolver.caltech.edu/CaltechTHESIS:06062022-021936716

Abstract

Optimal controllers are usually designed to minimize cost under the assumption that the disturbance they encounter is drawn from some specific class. For example, in H₂ control the disturbance is assumed to be generated by a stochastic process and the controller is designed to minimize its expected cost, while in H_∞ control the disturbance is assumed to be generated adversarially and the controller is designed to minimize its worst-case cost. This approach suffers from an obvious drawback: a controller which encounters a disturbance which falls outside of the class it was designed to handle may perform poorly. This observation naturally motivates the design of adaptive controllers which dynamically adjust their control strategy as they causally observe the disturbance instead of blindly following a prescribed strategy.

Inspired by online learning, we propose data-dependent regret as a criterion for controller design. In the regret-optimal control paradigm, causal controllers are designed to minimize regret against a hypothetical optimal noncausal controller, which selects the cost-minimizing sequence of control actions given noncausal access to the disturbance sequence. Controllers with low regret retain a performance guarantee irrespective of how the disturbance is generated; it is this universality which makes our approach an attractive alternative to traditional H₂ and H_∞ control. The regret of the causal controller is bounded by some measure of the complexity of the disturbance sequence; we consider several different complexity measures, including the energy of the disturbance sequence, which measures the size of the disturbance, and the pathlength of the disturbance, which measures its variation over time. We also consider the alternative metric of competitive ratio, which is the worst-case ratio between the cost incurred by the causal controller and the cost incurred by the optimal noncausal controller. This metric can also be viewed as a special case of data-dependent regret, where the complexity measure is simply the offline optimal cost. For each of these complexity measures, we derive a corresponding control algorithm with optimal data-dependent regret. The key technique we introduce is an operator-theoretic reduction from regret-optimal control to H_∞ control; each of the regret-optimal controllers we obtain can be interpreted as an H_∞ controller in a synthetic system of larger dimension. We also extend regret-optimal control to the more challenging measurement-feedback setting, where the online controller must choose control actions without directly observing the disturbance sequence, using only noisy linear measurements of the state.

We show that the competitive controller can be arbitrarily well-approximated by the class of disturbance-action-controller (DAC) policies. The convexity of this class of policies makes it amenable to online optimization via a reduction to online convex optimization with memory, and this class has hence attracted much recent attention in online learning. Using our approximation result, we show how to obtain algorithms which achieve the "best-of-both-worlds": sublinear policy regret against DAC policies and approximate competitive ratio. These performance guarantees can even be extended to the "adaptive control" setting, where the controller does not know the system dynamics ahead of time and must perform online system identification.

We present numerical experiments in a linear dynamical system which demonstrate how the performance of regret-optimal controllers varies as a function of the complexity of the disturbance. We extend regret-optimal control to nonlinear dynamical systems using model-predictive control (MPC) and present experiments which suggest that regret-optimal control is a promising approach to adapting to model error in nonlinear control.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

online learning, adaptive control, regret minimization, robust control

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computing and Mathematical Sciences

Awards:

Bhansali Family Doctoral Prize in Computer Science, 2022.

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Hassibi, Babak

Thesis Committee:

Yue, Yisong (chair)
Wierman, Adam C.
Hazan, Elad
Hassibi, Babak

Defense Date:

26 May 2022

Record Number:

CaltechTHESIS:06062022-021936716

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:06062022-021936716

DOI:

10.7907/8t6d-4t53

Related URLs:

URL	URL Type	Description
https://arxiv.org/abs/2106.12097	arXiv	Adapted for Chapter 3
https://arxiv.org/abs/2002.02574	arXiv	Adapted for Chapter 2
https://proceedings.mlr.press/v168/goel22a.html	Publisher	Adapted for Chapter 3
https://arxiv.org/abs/2107.13657	arXiv	Adpated for Chapter 3

ORCID:

Author	ORCID
Goel, Gautam	000-0002-7054-7218

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

14947

Collection:

CaltechTHESIS

Deposited By:

Gautam Goel

Deposited On:

06 Jun 2022 22:35

Last Modified:

04 Aug 2022 19:06

Thesis Files

PDF - Final Version
See Usage Policy.
2MB

Repository Staff Only: item control page