Citation
Nicholson, Alexander Marshall (2002) Generalization Error Estimates and Training Data Valuation. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/1H16-VX81. https://resolver.caltech.edu/CaltechETD:etd-09062005-083717
Abstract
This thesis addresses several problems related to generalization in machine learning systems. We introduce a theoretical framework for studying learning and generalization. Within this framework, a closed form is derived for the expected generalization error that estimates the out-of-sample performance in terms of the in-sample performance. We consider the problem of overfitting and show that, using a simple exhaustive learning algorithm, overfitting does not occur. These results do not assume a particular form of the target function, input distribution or learning model, and hold even with noisy data sets. We apply our analysis to practical learning systems, illustrate how it may be used to estimate out-of-sample errors in practice, and demonstrate that the resulting estimates improve upon errors estimated with a validation set for real world problems. Based on this study of generalization, we develop a technique for quantitative valuation of training data. We demonstrate that this valuation may be used to select training sets that improve generalization performance. With a reasonable prior over target functions, it further allows us to estimate the level of noise in a data set and provides for detection and correction of noise in individual examples. Finally, this data valuation can be used to classify new examples, yielding a new learning algorithm that is shown to be relatively robust to noise.
Item Type: | Thesis (Dissertation (Ph.D.)) |
---|---|
Subject Keywords: | data valuation |
Degree Grantor: | California Institute of Technology |
Division: | Engineering and Applied Science |
Major Option: | Computer Science |
Thesis Availability: | Public (worldwide access) |
Research Advisor(s): |
|
Thesis Committee: |
|
Defense Date: | 16 May 2002 |
Non-Caltech Author Email: | zander (AT) fantastivision.com |
Record Number: | CaltechETD:etd-09062005-083717 |
Persistent URL: | https://resolver.caltech.edu/CaltechETD:etd-09062005-083717 |
DOI: | 10.7907/1H16-VX81 |
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. |
ID Code: | 3347 |
Collection: | CaltechTHESIS |
Deposited By: | Imported from ETD-db |
Deposited On: | 06 Sep 2005 |
Last Modified: | 18 Aug 2022 23:53 |
Thesis Files
|
PDF (Nicholson_a_2002.pdf)
- Final Version
See Usage Policy. 3MB |
Repository Staff Only: item control page