A Caltech Library Service

Hybrid Human-Machine Vision Systems: Image Annotation using Crowds, Experts and Machines


Welinder, Nils Peter Egon (2012) Hybrid Human-Machine Vision Systems: Image Annotation using Crowds, Experts and Machines. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/N6A2-F347.


The amount of digital image and video data keeps increasing at an ever-faster rate. While "big data" holds the promise of leading science to new discoveries, raw image data in itself is not of much use. In order to statistically analyze the data, it must be quantified and annotated. We argue that entirely automated methods are not accurate enough to annotate data in the short term. Crowdsourcing is an alternative that provides higher accuracy, but is too expensive to scale to millions of images. Instead, the solution is hybrid human-machine vision systems, where the work of both humans and machines is balanced to be as cost-effective and accurate as possible. With this goal in mind, we begin by categorizing different types of image annotations, and describe how nonexpert annotators can be trained to carry out challenging image annotation tasks. Having identified which types of annotations are appropriate for most tasks, including binary, confidence, pair-wise and continuous annotations, we present models for crowdsourcing annotations from hundreds of expert and nonexpert annotators (humans). By trading off the bias and expertise of multiple annotators, we show that it is possible to achieve high-quality annotations with very few labels. We show that the number of labels can be further reduced by actively choosing the best annotators to carry out most of the work. Finally, we study the problem of estimating the performance of automated classifiers (machines) used to annotate large datasets where few ground truth labels are available. Using a semisupervised model for classifier confidence scores, we show that it is possible to accurately estimate classifier performance with very few labels.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Image Annotation, Computer Vision, Crowdsourcing
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computation and Neural Systems
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Perona, Pietro
Thesis Committee:
  • Perona, Pietro (chair)
  • Shimojo, Shinsuke
  • Krause, Andreas
  • Belongie, Serge J.
  • Beck, James L.
Defense Date:17 May 2012
Record Number:CaltechTHESIS:05302012-110814322
Persistent URL:
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:7095
Deposited By: Nils Peter Egon Welinder
Deposited On:31 May 2012 16:10
Last Modified:08 Nov 2023 00:44

Thesis Files

PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page