Online Learning from Human Feedback with Applications to Exoskeleton Gait Optimization

Citation

Novoseller, Ellen Rachel (2021) Online Learning from Human Feedback with Applications to Exoskeleton Gait Optimization. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/gvtx-1586. https://resolver.caltech.edu/CaltechTHESIS:12092020-162149429

Abstract

Systems that intelligently interact with humans could improve people's lives in numerous ways and in numerous settings, such as households, hospitals, and workplaces. Yet, developing algorithms that reliably and efficiently personalize their interactions with people in real-world environments remains challenging. In particular, one major difficulty lies in adapting to human-in-the-loop feedback, in which an algorithm makes sequential decisions while receiving online feedback from humans; throughout this interaction, the algorithm seeks to optimize its decision-making quality, as measured by the utility of its performance to the human users. Such algorithms must balance between exploration and exploitation: on one hand, the algorithm must select uncertain strategies to fully explore the environment and the interacting human's preferences, while on the other hand, it must exploit the empirically-best-performing strategies to maximize its cumulative performance.

Learning from human feedback can be difficult, as people are often unreliable in specifying numerical scores. In contrast, humans can often more accurately provide various types of qualitative feedback, for instance pairwise preferences. Yet, sample efficiency is a significant concern in human-in-the-loop settings, as qualitative feedback is less informative than absolute metrics, and algorithms can typically pose only limited queries to human users. Thus, there is a need to create theoretically-grounded online learning algorithms that efficiently, reliably, and robustly optimize their interactions with humans while learning from online qualitative feedback.

This dissertation makes several contributions to algorithm design for human-in-the-loop learning. Firstly, this work develops the Dueling Posterior Sampling (DPS) algorithmic framework, a model-based, Bayesian approach for online learning in the settings of preference-based reinforcement learning and generalized linear dueling bandits. DPS is developed together with a theoretical regret analysis framework, and yields competitive empirical performance in a range of simulations. Additionally, this thesis presents the CoSpar and LineCoSpar algorithms for sample-efficient, mixed-initiative learning from pairwise preferences and coactive feedback. CoSpar and LineCoSpar are both deployed in human subject experiments with a lower-body exoskeleton to identify optimal, user-preferred exoskeleton walking gaits. This work presents the first demonstration of preference-based learning for optimizing dynamic crutchless exoskeleton walking for user comfort, and makes progress toward customizing exoskeletons and other assistive devices for individual users.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Human-in-the-loop learning; online learning; bandits; reinforcement learning; exoskeleton

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Control and Dynamical Systems

Awards:

Thomas A. Tisch Prize for Graduate Teaching in Computing and Mathematical Sciences, 2018.

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Burdick, Joel W. (advisor)
Yue, Yisong (co-advisor)

Thesis Committee:

Ames, Aaron D. (chair)
Burdick, Joel Wakeman
Yue, Yisong
Murray, Richard M.
Sadigh, Dorsa

Defense Date:

30 November 2020

Funders:

Funding Agency	Grant Number
NSF Graduate Research Fellowship	UNSPECIFIED
Amazon Graduate Fellowship	UNSPECIFIED
NIH	EB007615

Record Number:

CaltechTHESIS:12092020-162149429

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:12092020-162149429

DOI:

10.7907/gvtx-1586

Related URLs:

URL	URL Type	Description
https://arxiv.org/abs/1908.01289	DOI	Parts of article adapted for Chapter 3-4.
https://doi.org/10.1109/ICRA40945.2020.9196661	arXiv	Parts of article adapted for Chapter 5.
https://arxiv.org/abs/2003.06495	arXiv	Parts of article adapted for Chapters 5.
https://github.com/ernovoseller/DuelingPosteriorSampling	Related Item	Code corresponding to parts of Chapter 4 (Dueling Posterior Sampling algorithm).
https://github.com/ernovoseller/CoSpar	Related Item	Code corresponding to parts of Chapter 5 (CoSpar algorithm).
https://github.com/myracheng/linecospar	Related Item	Code corresponding to parts of Chapter 5 (LineCoSpar algorithm).

ORCID:

Author	ORCID
Novoseller, Ellen Rachel	0000-0001-5263-0598

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

14021

Collection:

CaltechTHESIS

Deposited By:

Ellen Novoseller

Deposited On:

18 Dec 2020 17:42

Last Modified:

03 Nov 2021 20:21

Thesis Files

PDF - Final Version
Creative Commons Attribution Non-commercial.
12MB

Repository Staff Only: item control page