A Caltech Library Service

Online Learning from Human Feedback with Applications to Exoskeleton Gait Optimization


Novoseller, Ellen Rachel (2021) Online Learning from Human Feedback with Applications to Exoskeleton Gait Optimization. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/gvtx-1586.


Systems that intelligently interact with humans could improve people's lives in numerous ways and in numerous settings, such as households, hospitals, and workplaces. Yet, developing algorithms that reliably and efficiently personalize their interactions with people in real-world environments remains challenging. In particular, one major difficulty lies in adapting to human-in-the-loop feedback, in which an algorithm makes sequential decisions while receiving online feedback from humans; throughout this interaction, the algorithm seeks to optimize its decision-making quality, as measured by the utility of its performance to the human users. Such algorithms must balance between exploration and exploitation: on one hand, the algorithm must select uncertain strategies to fully explore the environment and the interacting human's preferences, while on the other hand, it must exploit the empirically-best-performing strategies to maximize its cumulative performance.

Learning from human feedback can be difficult, as people are often unreliable in specifying numerical scores. In contrast, humans can often more accurately provide various types of qualitative feedback, for instance pairwise preferences. Yet, sample efficiency is a significant concern in human-in-the-loop settings, as qualitative feedback is less informative than absolute metrics, and algorithms can typically pose only limited queries to human users. Thus, there is a need to create theoretically-grounded online learning algorithms that efficiently, reliably, and robustly optimize their interactions with humans while learning from online qualitative feedback.

This dissertation makes several contributions to algorithm design for human-in-the-loop learning. Firstly, this work develops the Dueling Posterior Sampling (DPS) algorithmic framework, a model-based, Bayesian approach for online learning in the settings of preference-based reinforcement learning and generalized linear dueling bandits. DPS is developed together with a theoretical regret analysis framework, and yields competitive empirical performance in a range of simulations. Additionally, this thesis presents the CoSpar and LineCoSpar algorithms for sample-efficient, mixed-initiative learning from pairwise preferences and coactive feedback. CoSpar and LineCoSpar are both deployed in human subject experiments with a lower-body exoskeleton to identify optimal, user-preferred exoskeleton walking gaits. This work presents the first demonstration of preference-based learning for optimizing dynamic crutchless exoskeleton walking for user comfort, and makes progress toward customizing exoskeletons and other assistive devices for individual users.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Human-in-the-loop learning; online learning; bandits; reinforcement learning; exoskeleton
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Control and Dynamical Systems
Awards:Thomas A. Tisch Prize for Graduate Teaching in Computing and Mathematical Sciences, 2018.
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Burdick, Joel W. (advisor)
  • Yue, Yisong (co-advisor)
Thesis Committee:
  • Ames, Aaron D. (chair)
  • Burdick, Joel Wakeman
  • Yue, Yisong
  • Murray, Richard M.
  • Sadigh, Dorsa
Defense Date:30 November 2020
Funding AgencyGrant Number
NSF Graduate Research FellowshipUNSPECIFIED
Amazon Graduate FellowshipUNSPECIFIED
Record Number:CaltechTHESIS:12092020-162149429
Persistent URL:
Related URLs:
URLURL TypeDescription of article adapted for Chapter 3-4. of article adapted for Chapter 5. of article adapted for Chapters 5. ItemCode corresponding to parts of Chapter 4 (Dueling Posterior Sampling algorithm). ItemCode corresponding to parts of Chapter 5 (CoSpar algorithm). ItemCode corresponding to parts of Chapter 5 (LineCoSpar algorithm).
Novoseller, Ellen Rachel0000-0001-5263-0598
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14021
Deposited By: Ellen Novoseller
Deposited On:18 Dec 2020 17:42
Last Modified:03 Nov 2021 20:21

Thesis Files

[img] PDF - Final Version
Creative Commons Attribution Non-commercial.


Repository Staff Only: item control page