A Caltech Library Service

Value-Based Decision Making and Learning as Algorithms Computed by the Nervous System


Colas, Jaron Taylor (2018) Value-Based Decision Making and Learning as Algorithms Computed by the Nervous System. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/Z90R9MK8.


How do we do what we do? Casting light on this essential question, the blossoming perspective of computational cognitive neuroscience gives rise to the present exposition of the nervous system and its phenomena of value-based decision making and learning. As justified herein by not only theory but also simulation against empirical data, human decision making and learning are framed mathematically in the explicit terms of two fundamental classes of algorithms--namely, sequential sampling and reinforcement learning. These counterparts are complementary in their coverage of the dynamics of unified neural, mental, and behavioral processes at different temporal scales. Novel variants of models based on such algorithms are introduced here to account for findings from experiments including measurements of both behavior and the brain in human participants.

In principle, formal dynamical models of decision making hold the potential to represent fundamental computations underpinning value-based (i.e., preferential) decisions in addition to perceptual decisions. Sequential-sampling models such as the race model and the drift-diffusion model that are grounded in simplicity, analytical tractability, and optimality remain popular, but some of their more recent counterparts have instead been designed with an aim for more feasibility as architectures to be implemented by actual neural systems. In Chapter 2, connectionist models are proposed at an intermediate level of analysis that bridges mental phenomena and underlying neurophysiological mechanisms. Several such models drawing elements from the established race, drift-diffusion, feedforward-inhibition, divisive-normalization, and competing-accumulator models were tested with respect to fitting empirical data from human participants making choices between foods on the basis of hedonic value rather than a traditional perceptual attribute. Even when considering performance at emulating behavior alone, more neurally plausible models were set apart from more normative race or drift-diffusion models both quantitatively and qualitatively despite remaining parsimonious. To best capture the paradigm, a novel six-parameter computational model was formulated with features including hierarchical levels of competition via mutual inhibition as well as a static approximation of attentional modulation, which promotes "winner-take-all" processing. Moreover, a meta-analysis encompassing several related experiments validated the robustness of model-predicted trends in humans' value-based choices and concomitant reaction times. These findings have yet further implications for analysis of neurophysiological data in accordance with computational modeling, which is also discussed in this new light.

Decision making in any brain is imperfect and costly in terms of time and energy. Operating under such constraints, an organism could be in a position to improve performance if an opportunity arose to exploit informative patterns in the environment being searched. Such an improvement of performance could entail both faster and more accurate (i.e., reward-maximizing) decisions. Chapter 3 investigated the extent to which human participants could learn to take advantage of immediate patterns in the spatial arrangement of serially presented foods such that a region of space would consistently be associated with greater subjective value. Eye movements leading up to choices demonstrated rapidly induced biases in the selective allocation of visual fixation and attention that were accompanied by both faster and more accurate choices of desired goods as implicit learning occurred. However, for the control condition with its spatially balanced reward environment, these subjects exhibited preexisting lateralized biases for eye and hand movements (i.e., leftward and rightward, respectively) that could act in opposition not only to each other but also to the orienting biases elicited by the experimental manipulation, producing an asymmetry between the left and right hemifields with respect to performance. Potentially owing at least in part to learned cultural conventions (e.g., reading from left to right), the findings herein particularly revealed an intrinsic leftward bias underlying initial saccades in the midst of more immediate feedback-directed processes for which spatial biases can be learned flexibly to optimize oculomotor and manual control in value-based decision making. The present study thus replicates general findings of learned attentional biases in a novel context with inherently rewarding stimuli and goes on to further elucidate the interactions between endogenous and exogenous biases.

Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. For Chapter 4, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models--namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:computational cognitive neuroscience; decision neuroscience; value; reward; decision making; learning; attention; sequential sampling; reinforcement learning; reaction time; eye tracking; functional neuroimaging; dopaminergic midbrain; striatum; human brain
Degree Grantor:California Institute of Technology
Division:Biology and Biological Engineering
Major Option:Computation and Neural Systems
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • O'Doherty, John P.
Thesis Committee:
  • O'Doherty, John P. (chair)
  • Rangel, Antonio
  • Adolphs, Ralph
  • Shimojo, Shinsuke
Defense Date:7 November 2017
Funding AgencyGrant Number
NSF Graduate Research Fellowship ProgramUNSPECIFIED
Rose Hills FoundationUNSPECIFIED
Gordon and Betty Moore FoundationUNSPECIFIED
Record Number:CaltechThesis:11162017-001051515
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for ch. 2 adapted for ch. 3 adapted for ch. 4
Colas, Jaron Taylor0000-0003-1872-7614
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:10557
Deposited By: Jaron Colas
Deposited On:30 Nov 2017 21:46
Last Modified:04 Oct 2019 00:18

Thesis Files

PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page