A Caltech Library Service

Towards a Visipedia: Combining Computer Vision and Communities of Experts


Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220.


Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Computer vision, machine learning, fine-grained classification, datasets
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computer Science
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Perona, Pietro
Thesis Committee:
  • Yue, Yisong (chair)
  • Wierman, Adam C.
  • Belongie, Serge J.
  • Perona, Pietro
Defense Date:7 September 2018
Record Number:CaltechTHESIS:05082019-103122440
Persistent URL:
Related URLs:
URLURL TypeDescription adapted for Ch. 2. adapted for Ch. 3. adapted for Ch. 4. adapted for Ch. 5. adapted for Ch. 6
Van Horn, Grant Richard0000-0003-2953-9651
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11502
Deposited By: Grant Van Horn
Deposited On:10 Jun 2019 22:27
Last Modified:08 Nov 2023 00:44

Thesis Files

PDF - Final Version
See Usage Policy.


Repository Staff Only: item control page