Peters, Robert Jacob (2005) Visual attention and object categorization: from psychophysics to computational models. Dissertation (Ph.D.), California Institute of Technology. http://resolver.caltech.edu/CaltechETD:etd-06062004-213811
This thesis is arranged in two main parts. Each part relies an approach using the methods of psychophysics and computational modeling to bring abstract or high-level theories of vision closer to a concrete neurobiological foundation. The first part addresses the topic of visual object categorization. Previous studies using high-level models categorization have left unresolved issues of neurobiological relevance, including how features are extracted from the image and the role played by memory capacity in categorization performance. We compared the ability of a comprehensive set of models to match the categorization performance of human observers while explicitly accounting for the models' numbers of free parameters. The most successful models did not require a large memory capacity, suggesting that a sparse, abstracted representation of category properties may underlie categorization performance. This type of representation--different from classical prototype abstraction--could also be extracted directly from two-dimensional images via a biologically plausible early vision model, rather than relying on experimenter-imposed features. The second part addresses visual attention in its bottom-up, stimulus-driven form. Previous research showed that a model of bottom-up visual attention can account in part for the spatial positions of locations fixated by humans while free-viewing complex natural and artificial scenes. We used a similar framework to quantify how the predictive ability of such a model may be enhanced by new model components based on several specific mechanisms within the functional architecture of the visual system. These components included richer interactions among orientation-tuned units, both at short-range (for clutter reduction) and at long-range (for contour facilitation). Subjects free-viewed naturalistic and artificial images while their eye movements were recorded. The resulting fixation locations were compared with the models' predicted salience maps. We found that each new model component was important in attaining a strong quantitative correspondence between model and behavior. Finally, we compared the model predictions with the spatial locations obtained from a task that relied on mouse clicking rather than eye tracking. As these models become more accurate in predicting behaviorally-relevant salient locations, they become useful to a range of applications in computer vision and human-machine interface design.
|Item Type:||Thesis (Dissertation (Ph.D.))|
|Subject Keywords:||computational models; eye tracking; multidimensional scaling; saliency; visual attention; visual object categorization|
|Degree Grantor:||California Institute of Technology|
|Major Option:||Computation and Neural Systems|
|Thesis Availability:||Public (worldwide access)|
|Defense Date:||4 June 2004|
|Author Email:||rjpeters (AT) klab.caltech.edu|
|Default Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Imported from ETD-db|
|Deposited On:||07 Jun 2004|
|Last Modified:||26 Dec 2012 02:51|
- Final Version
See Usage Policy.
Repository Staff Only: item control page