Value-Based Decision Making and Learning as Algorithms Computed by the Nervous System

Citation

Colas, Jaron Taylor (2018) Value-Based Decision Making and Learning as Algorithms Computed by the Nervous System. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/Z90R9MK8. https://resolver.caltech.edu/CaltechThesis:11162017-001051515

Abstract

How do we do what we do? Casting light on this essential question, the blossoming perspective of computational cognitive neuroscience gives rise to the present exposition of the nervous system and its phenomena of value-based decision making and learning. As justified herein by not only theory but also simulation against empirical data, human decision making and learning are framed mathematically in the explicit terms of two fundamental classes of algorithms--namely, sequential sampling and reinforcement learning. These counterparts are complementary in their coverage of the dynamics of unified neural, mental, and behavioral processes at different temporal scales. Novel variants of models based on such algorithms are introduced here to account for findings from experiments including measurements of both behavior and the brain in human participants.

In principle, formal dynamical models of decision making hold the potential to represent fundamental computations underpinning value-based (i.e., preferential) decisions in addition to perceptual decisions. Sequential-sampling models such as the race model and the drift-diffusion model that are grounded in simplicity, analytical tractability, and optimality remain popular, but some of their more recent counterparts have instead been designed with an aim for more feasibility as architectures to be implemented by actual neural systems. In Chapter 2, connectionist models are proposed at an intermediate level of analysis that bridges mental phenomena and underlying neurophysiological mechanisms. Several such models drawing elements from the established race, drift-diffusion, feedforward-inhibition, divisive-normalization, and competing-accumulator models were tested with respect to fitting empirical data from human participants making choices between foods on the basis of hedonic value rather than a traditional perceptual attribute. Even when considering performance at emulating behavior alone, more neurally plausible models were set apart from more normative race or drift-diffusion models both quantitatively and qualitatively despite remaining parsimonious. To best capture the paradigm, a novel six-parameter computational model was formulated with features including hierarchical levels of competition via mutual inhibition as well as a static approximation of attentional modulation, which promotes "winner-take-all" processing. Moreover, a meta-analysis encompassing several related experiments validated the robustness of model-predicted trends in humans' value-based choices and concomitant reaction times. These findings have yet further implications for analysis of neurophysiological data in accordance with computational modeling, which is also discussed in this new light.

Decision making in any brain is imperfect and costly in terms of time and energy. Operating under such constraints, an organism could be in a position to improve performance if an opportunity arose to exploit informative patterns in the environment being searched. Such an improvement of performance could entail both faster and more accurate (i.e., reward-maximizing) decisions. Chapter 3 investigated the extent to which human participants could learn to take advantage of immediate patterns in the spatial arrangement of serially presented foods such that a region of space would consistently be associated with greater subjective value. Eye movements leading up to choices demonstrated rapidly induced biases in the selective allocation of visual fixation and attention that were accompanied by both faster and more accurate choices of desired goods as implicit learning occurred. However, for the control condition with its spatially balanced reward environment, these subjects exhibited preexisting lateralized biases for eye and hand movements (i.e., leftward and rightward, respectively) that could act in opposition not only to each other but also to the orienting biases elicited by the experimental manipulation, producing an asymmetry between the left and right hemifields with respect to performance. Potentially owing at least in part to learned cultural conventions (e.g., reading from left to right), the findings herein particularly revealed an intrinsic leftward bias underlying initial saccades in the midst of more immediate feedback-directed processes for which spatial biases can be learned flexibly to optimize oculomotor and manual control in value-based decision making. The present study thus replicates general findings of learned attentional biases in a novel context with inherently rewarding stimuli and goes on to further elucidate the interactions between endogenous and exogenous biases.

Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. For Chapter 4, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models--namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

computational cognitive neuroscience; decision neuroscience; value; reward; decision making; learning; attention; sequential sampling; reinforcement learning; reaction time; eye tracking; functional neuroimaging; dopaminergic midbrain; striatum; human brain

Degree Grantor:

California Institute of Technology

Division:

Biology and Biological Engineering

Major Option:

Computation and Neural Systems

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

O'Doherty, John P.

Thesis Committee:

O'Doherty, John P. (chair)
Rangel, Antonio
Adolphs, Ralph
Shimojo, Shinsuke

Defense Date:

7 November 2017

Funders:

Funding Agency	Grant Number
NSF Graduate Research Fellowship Program	UNSPECIFIED
Rose Hills Foundation	UNSPECIFIED
Gordon and Betty Moore Foundation	UNSPECIFIED
NIH	R01DA033077
NIH	R01DA040011

Record Number:

CaltechThesis:11162017-001051515

Persistent URL:

https://resolver.caltech.edu/CaltechThesis:11162017-001051515

DOI:

10.7907/Z90R9MK8

Related URLs:

URL	URL Type	Description
https://doi.org/10.1371/journal.pone.0186822	DOI	Article adapted for ch. 2
https://doi.org/10.3389/fpsyg.2017.02000	DOI	Article adapted for ch. 3
https://doi.org/10.1371/journal.pcbi.1005810	DOI	Article adapted for ch. 4

ORCID:

Author	ORCID
Colas, Jaron Taylor	0000-0003-1872-7614

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

10557

Collection:

CaltechTHESIS

Deposited By:

Jaron Colas

Deposited On:

30 Nov 2017 21:46

Last Modified:

04 Oct 2019 00:18

Thesis Files

Preview

PDF - Final Version
See Usage Policy.
8MB

Repository Staff Only: item control page