Citation
Pratap, Amrit (2008) Adaptive Learning Algorithms and Data Cloning. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/GV3D-AB69. https://resolver.caltech.edu/CaltechETD:etd-05292008-231048
Abstract
This thesis is in the field of machine learning: the use of data to automatically learn a hypothesis to predict the future behavior of a system. It summarizes three of my research projects.
We first investigate the role of margins in the phenomenal success of the Boosting Algorithms. AdaBoost (Adaptive Boosting) is an algorithm for generating an ensemble of hypotheses for classification. The superior out-of-sample performance of AdaBoost has been attributed to the fact that it can generate a classifier which classifies the points with a large margin of confidence. This led to the development of many new algorithms focusing on optimizing the margin of confidence. It was observed that directly optimizing the margins leads to a poor performance. This apparent contradiction has been the topic of a long unresolved debate in the machine-learning community. We introduce new algorithms which are expressly designed to test the margin hypothesis and provide concrete evidence which refutes the margin argument.
We then propose a novel algorithm for Adaptive sampling under Monotonicity constraint. The typical learning problem takes examples of the target function as input information and produces a hypothesis that approximates the target as an output. We consider a generalization of this paradigm by taking different types of information as input, and producing only specific properties of the target as output. This is a very common setup which occurs in many different real-life settings where the samples are expensive to obtain. We show experimentally that our algorithm achieves better performance than the existing methods, such as Staircase procedure and PEST.
One of the major pitfalls in machine learning research is that of selection bias. This is mostly introduced unconsciously due to the choices made during the learning process, which often lead to over-optimistic estimates of the performance. In the third project, we introduce a new methodology for systematically reducing selection bias. Experiments show that using cloned datasets for model selection can lead to better performance and reduce the selection bias.
Item Type: | Thesis (Dissertation (Ph.D.)) |
---|---|
Subject Keywords: | AdaBoost; AlphaBoost; data cloning; DLPBoost; ensemble learning; margin theory; monotonic estimation; selection bias |
Degree Grantor: | California Institute of Technology |
Division: | Engineering and Applied Science |
Major Option: | Computer Science |
Thesis Availability: | Public (worldwide access) |
Research Advisor(s): |
|
Thesis Committee: |
|
Defense Date: | 11 February 2008 |
Record Number: | CaltechETD:etd-05292008-231048 |
Persistent URL: | https://resolver.caltech.edu/CaltechETD:etd-05292008-231048 |
DOI: | 10.7907/GV3D-AB69 |
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. |
ID Code: | 2267 |
Collection: | CaltechTHESIS |
Deposited By: | Imported from ETD-db |
Deposited On: | 30 Jul 2008 |
Last Modified: | 05 Jan 2021 23:05 |
Thesis Files
|
PDF
- Final Version
See Usage Policy. 1MB |
Repository Staff Only: item control page