Citation
Van Horn, Grant Richard (2019) Towards a Visipedia: Combining Computer Vision and Communities of Experts. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/20DQ-Y220. https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440
Abstract
Motivated by the idea of a Visipedia, where users can search and explore by image, this thesis presents tools and techniques for empowering expert communities through computer vision. The collective aim of this work is to provide a scalable foundation upon which an application like Visipedia can be built. We conduct experiments using two highly motivated communities, the birding community and the naturalist community, and report results and lessons on how to build the necessary components of a Visipedia. First, we conduct experiments analyzing the behavior of state-of-the-art computer vision classifiers on long tailed datasets. We find poor feature sharing between classes, potentially limiting the applicability of these models and emphasizing the ability to intelligently direct data collection resources. Second, we devise online crowdsourcing algorithms to make dataset collection for binary labels, multiclass labels, keypoints, and mulit-instance bounding boxes faster, cheaper, and more accurate. These methods jointly estimate labels, worker skills, and train computer vision models for these tasks. Experiments show that we can achieve significant cost savings compared to traditional data collection techniques, and that we can produce a more accurate dataset compared to traditional data collection techniques. Third, we present two fine-grained datasets, detail how they were constructed, and analyze the test accuracy of state-of-the-art methods. These datasets are then used to create applications that help users identify species in their photographs: Merlin, an app assisting users in identifying birds species, and iNaturalist, an app that assists users in identifying a broad variety of species. Finally, we present work aimed at reducing the computational burden of large scale classification with the goal of creating an application that allows users to classify tens of thousands of species in real time on their mobile device. As a whole, the lessons learned and the techniques presented in this thesis bring us closer to the realization of a Visipedia.
Item Type: | Thesis (Dissertation (Ph.D.)) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subject Keywords: | Computer vision, machine learning, fine-grained classification, datasets | ||||||||||||||||||
Degree Grantor: | California Institute of Technology | ||||||||||||||||||
Division: | Engineering and Applied Science | ||||||||||||||||||
Major Option: | Computer Science | ||||||||||||||||||
Thesis Availability: | Public (worldwide access) | ||||||||||||||||||
Research Advisor(s): |
| ||||||||||||||||||
Thesis Committee: |
| ||||||||||||||||||
Defense Date: | 7 September 2018 | ||||||||||||||||||
Record Number: | CaltechTHESIS:05082019-103122440 | ||||||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:05082019-103122440 | ||||||||||||||||||
DOI: | 10.7907/20DQ-Y220 | ||||||||||||||||||
Related URLs: |
| ||||||||||||||||||
ORCID: |
| ||||||||||||||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||||||
ID Code: | 11502 | ||||||||||||||||||
Collection: | CaltechTHESIS | ||||||||||||||||||
Deposited By: | Grant Van Horn | ||||||||||||||||||
Deposited On: | 10 Jun 2019 22:27 | ||||||||||||||||||
Last Modified: | 08 Nov 2023 00:44 |
Thesis Files
|
PDF
- Final Version
See Usage Policy. 20MB |
Repository Staff Only: item control page