Guaranteed Policy Performance in Reinforcement Learning

Citation

Voloshin, Cameron (2024) Guaranteed Policy Performance in Reinforcement Learning. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/n2fg-e554. https://resolver.caltech.edu/CaltechTHESIS:06062024-061120491

Abstract

Decision-making is ubiquitous in everyday life. Increasingly, researchers are seeking answers on how to optimally solve sequential decision-making tasks. Thanks to recent availability of computation, advances in deep learning, and released open-sourced code, it has become easy to train a computational agent to make decisions in many domains. Nevertheless, in realistic scenarios where the consequences of failure are high, running a trained computational agent in the wild poses substantial risk.

The goal of this thesis is to develop and advance techniques that guarantee a learned agent does what we expect it to do. The thesis tackles two central questions:

1) Given an agent, how can we predict if it will perform desirably?

2) Can we structure the learning process to guarantee desirable post-learning performance?

On the former question, this thesis proposes multiple algorithms to evaluate such agents, finds factors that have high influence on the success of agent evaluation, and open-sources benchmarks for further development in the space.

On the latter question, this thesis formulates desirable agent behavior as a constrained optimization with varying types of constraints depending on the structure afforded to the practitioner. Constraining the search space over the learning process ensures post-learning behaviors will, by definition, perform as desired.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Reinforcement Learning, Policy Learning, Off policy Evaluation

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computing and Mathematical Sciences

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Yue, Yisong

Thesis Committee:

Wierman, Adam C. (chair)
Yue, Yisong
Bouman, Katherine L.
Chaudhuri, Swarat

Defense Date:

13 June 2023

Non-Caltech Author Email:

clvoloshin (AT) gmail.com

Record Number:

CaltechTHESIS:06062024-061120491

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:06062024-061120491

DOI:

10.7907/n2fg-e554

Related URLs:

URL	URL Type	Description
https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/a5e00132373a7031000fd987a3c9f87b-Abstract-round1.html	Publisher	Article adapted for Chapter 3
https://proceedings.mlr.press/v130/voloshin21a.html	Publisher	Article adapted for Chapter 4
https://proceedings.mlr.press/v97/le19a.html	Publisher	Article adapted for Chapter 5 and 6
https://proceedings.neurips.cc/paper_files/paper/2022/hash/70b8505ac79e3e131756f793cd80eb8d-Abstract-Conference.html	Publisher	Article adapted for Chapter 7
https://proceedings.mlr.press/v202/voloshin23a.html	Publisher	Article adapted for Chapter 8

ORCID:

Author	ORCID
Voloshin, Cameron	0009-0007-7725-6660

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

16508

Collection:

CaltechTHESIS

Deposited By:

Cameron Voloshin

Deposited On:

06 Jun 2024 23:04

Last Modified:

14 Jun 2024 21:18

Thesis Files

PDF - Final Version
See Usage Policy.
18MB

Repository Staff Only: item control page