Adaptive Learning Algorithms and Data Cloning

Citation

Pratap, Amrit (2008) Adaptive Learning Algorithms and Data Cloning. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/GV3D-AB69. https://resolver.caltech.edu/CaltechETD:etd-05292008-231048

Abstract

This thesis is in the field of machine learning: the use of data to automatically learn a hypothesis to predict the future behavior of a system. It summarizes three of my research projects.

We first investigate the role of margins in the phenomenal success of the Boosting Algorithms. AdaBoost (Adaptive Boosting) is an algorithm for generating an ensemble of hypotheses for classification. The superior out-of-sample performance of AdaBoost has been attributed to the fact that it can generate a classifier which classifies the points with a large margin of confidence. This led to the development of many new algorithms focusing on optimizing the margin of confidence. It was observed that directly optimizing the margins leads to a poor performance. This apparent contradiction has been the topic of a long unresolved debate in the machine-learning community. We introduce new algorithms which are expressly designed to test the margin hypothesis and provide concrete evidence which refutes the margin argument.

We then propose a novel algorithm for Adaptive sampling under Monotonicity constraint. The typical learning problem takes examples of the target function as input information and produces a hypothesis that approximates the target as an output. We consider a generalization of this paradigm by taking different types of information as input, and producing only specific properties of the target as output. This is a very common setup which occurs in many different real-life settings where the samples are expensive to obtain. We show experimentally that our algorithm achieves better performance than the existing methods, such as Staircase procedure and PEST.

One of the major pitfalls in machine learning research is that of selection bias. This is mostly introduced unconsciously due to the choices made during the learning process, which often lead to over-optimistic estimates of the performance. In the third project, we introduce a new methodology for systematically reducing selection bias. Experiments show that using cloned datasets for model selection can lead to better performance and reduce the selection bias.

Item Type:	Thesis (Dissertation (Ph.D.))
Subject Keywords:	AdaBoost; AlphaBoost; data cloning; DLPBoost; ensemble learning; margin theory; monotonic estimation; selection bias
Degree Grantor:	California Institute of Technology
Division:	Engineering and Applied Science
Major Option:	Computer Science
Thesis Availability:	Public (worldwide access)
Research Advisor(s):	Abu-Mostafa, Yaser S.
Thesis Committee:	Abu-Mostafa, Yaser S. (chair) Martin, Alain J. Bruck, Jehoshua Perona, Pietro
Defense Date:	11 February 2008
Record Number:	CaltechETD:etd-05292008-231048
Persistent URL:	https://resolver.caltech.edu/CaltechETD:etd-05292008-231048
DOI:	10.7907/GV3D-AB69
Default Usage Policy:	No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:	2267
Collection:	CaltechTHESIS
Deposited By:	Imported from ETD-db
Deposited On:	30 Jul 2008
Last Modified:	05 Jan 2021 23:05

Thesis Files

Preview

PDF - Final Version
See Usage Policy.
1MB

Repository Staff Only: item control page