Citation
Lin, Hsuan-Tien (2008) From Ordinal Ranking to Binary Classification. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/7B0F-E145. https://resolver.caltech.edu/CaltechETD:etd-05302008-143505
Abstract
We study the ordinal ranking problem in machine learning. The problem can be viewed as a classification problem with additional ordinal information or as a regression problem without actual numerical information. From the classification perspective, we formalize the concept of ordinal information by a cost-sensitive setup, and propose some novel cost-sensitive classification algorithms. The algorithms are derived from a systematic cost-transformation technique, which carries a strong theoretical guarantee. Experimental results show that the novel algorithms perform well both in a general cost-sensitive setup and in the specific ordinal ranking setup.
From the regression perspective, we propose the threshold ensemble model for ordinal ranking, which allows the machines to estimate a real-valued score (like regression) before quantizing it to an ordinal rank. We study the generalization ability of threshold ensembles and derive novel large-margin bounds on its expected test performance. In addition, we improve an existing algorithm and propose a novel algorithm for constructing large-margin threshold ensembles. Our proposed algorithms are efficient in training and achieve decent out-of-sample performance when compared with the state-of-the-art algorithm on benchmark data sets.
We then study how ordinal ranking can be reduced to weighted binary classification. The reduction framework is simpler than the cost-sensitive classification approach and includes the threshold ensemble model as a special case. The framework allows us to derive strong theoretical results that tightly connect ordinal ranking with binary classification. We demonstrate the algorithmic and theoretical use of the reduction framework by extending SVM and AdaBoost, two of the most popular binary classification algorithms, to the area of ordinal ranking. Coupling SVM with the reduction framework results in a novel and faster algorithm for ordinal ranking with superior performance on real-world data sets, as well as a new bound on the expected test performance for generalized linear ordinal rankers. Coupling AdaBoost with the reduction framework leads to a novel algorithm that boosts the training accuracy of any cost-sensitive ordinal ranking algorithms theoretically, and in turn improves their test performance empirically.
From the studies above, the key to improve ordinal ranking is to improve binary classification. In the final part of the thesis, we include two projects that aim at understanding binary classification better in the context of ensemble learning. First, we discuss how AdaBoost is restricted to combining only a finite number of hypotheses and remove the restriction by formulating a framework of infinite ensemble learning based on SVM. The framework can output an infinite ensemble through embedding infinitely many hypotheses into an~SVM kernel. Using the framework, we show that binary classification (and hence ordinal ranking) can be improved by going from a finite ensemble to an infinite one. Second, we discuss how AdaBoost carries the property of being resistant to overfitting. Then, we propose the SeedBoost algorithm, which uses the property as a machinery to prevent other learning algorithms from overfitting. Empirical results demonstrate that SeedBoost can indeed improve an overfitting algorithm on some data sets.
Item Type: | Thesis (Dissertation (Ph.D.)) | ||||
---|---|---|---|---|---|
Subject Keywords: | AdaBoost; binary classification; cost transformation; cost-sensitive classification; ensemble learning; large-margin bound; machine learning; ORBoost; ordinal classification; ordinal ranking; ordinal regression; RankBoost; reduction; SVM; threshold ensemble model | ||||
Degree Grantor: | California Institute of Technology | ||||
Division: | Engineering and Applied Science | ||||
Major Option: | Computer Science | ||||
Thesis Availability: | Public (worldwide access) | ||||
Research Advisor(s): |
| ||||
Thesis Committee: |
| ||||
Defense Date: | 12 May 2008 | ||||
Non-Caltech Author Email: | htlin (AT) csie.ntu.edu.tw | ||||
Record Number: | CaltechETD:etd-05302008-143505 | ||||
Persistent URL: | https://resolver.caltech.edu/CaltechETD:etd-05302008-143505 | ||||
DOI: | 10.7907/7B0F-E145 | ||||
ORCID: |
| ||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||
ID Code: | 2321 | ||||
Collection: | CaltechTHESIS | ||||
Deposited By: | Imported from ETD-db | ||||
Deposited On: | 04 Jun 2008 | ||||
Last Modified: | 07 May 2020 22:53 |
Thesis Files
|
PDF (phdthesis.pdf)
- Final Version
See Usage Policy. 847kB |
Repository Staff Only: item control page