CaltechTHESIS
A Caltech Library Service

# Building probalistic models from databases

## Citation

Miller, John W. (1993) Building probalistic models from databases. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/mm98-d904. https://resolver.caltech.edu/CaltechETD:etd-08302007-155345

## Abstract

The problem of creating a probabilistic model using a database is analyzed in this thesis. Two independent results in probabilistic modeling are presented. The first result is a method for creating a model which produces accurate probability estimates. The model is a Gibbs probability distribution representation of the database. This model is created by a new transformation relating the joint probabilities of attributes in the database to Gibbs potentials. The theory of this transformation is presented together with a specific algorithm for efficiently collecting and using the Gibbs potentials. A hash table scheme is used to collect the important potentials without iterative error minimization or repeated searches through a database. The Gibbs modeling scheme allows flexible control of the tradeoffs involving modeling error and sampling error as well as the tradeoffs involved in using the resources of computation time and memory. The performance of the probabilistic modeling algorithm is tested and analyzed. Used as a classifier with a variety of datasets, the Gibbs modeling algorithm was found to equal or surpass the classification results of other models such as neural networks trained with backwards error propagation and the nearest neighbor classification algorithm.

The second independent result is the analysis of systems that use error minimization to estimate probabilities. Minimization to a probability has been a known property of the squared error and cross entropy objective functions. Here the necessary and sufficient conditions for minimization to a probability are developed. It is found that the squared error and cross entropy functions are two of the simplist functions from a family of objective functions which minimize to a probability. If the system is incapable of producing the outputs consistent with the probability estimates, it is shown the minimum error is achieved when the system produces outputs closest to the correct probability estimate outputs. The measure of closeness is described here in terms of the objective function.

Item Type: Thesis (Dissertation (Ph.D.)) California Institute of Technology Engineering and Applied Science Electrical Engineering Public (worldwide access) Goodman, Rodney M. Goodman, Rodney M. (chair)Franklin, Joel N. 26 August 1992 CaltechETD:etd-08302007-155345 https://resolver.caltech.edu/CaltechETD:etd-08302007-155345 10.7907/mm98-d904 No commercial reproduction, distribution, display or performance rights in this work are provided. 3289 CaltechTHESIS Imported from ETD-db 31 Aug 2007 16 Apr 2021 23:19

## Thesis Files

 Preview
PDF (Miller_jw_1993.pdf) - Final Version
See Usage Policy.

5MB

Repository Staff Only: item control page