A Caltech Library Service

Building probalistic models from databases


Miller, John W. (1993) Building probalistic models from databases. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/mm98-d904.


The problem of creating a probabilistic model using a database is analyzed in this thesis. Two independent results in probabilistic modeling are presented. The first result is a method for creating a model which produces accurate probability estimates. The model is a Gibbs probability distribution representation of the database. This model is created by a new transformation relating the joint probabilities of attributes in the database to Gibbs potentials. The theory of this transformation is presented together with a specific algorithm for efficiently collecting and using the Gibbs potentials. A hash table scheme is used to collect the important potentials without iterative error minimization or repeated searches through a database. The Gibbs modeling scheme allows flexible control of the tradeoffs involving modeling error and sampling error as well as the tradeoffs involved in using the resources of computation time and memory. The performance of the probabilistic modeling algorithm is tested and analyzed. Used as a classifier with a variety of datasets, the Gibbs modeling algorithm was found to equal or surpass the classification results of other models such as neural networks trained with backwards error propagation and the nearest neighbor classification algorithm.

The second independent result is the analysis of systems that use error minimization to estimate probabilities. Minimization to a probability has been a known property of the squared error and cross entropy objective functions. Here the necessary and sufficient conditions for minimization to a probability are developed. It is found that the squared error and cross entropy functions are two of the simplist functions from a family of objective functions which minimize to a probability. If the system is incapable of producing the outputs consistent with the probability estimates, it is shown the minimum error is achieved when the system produces outputs closest to the correct probability estimate outputs. The measure of closeness is described here in terms of the objective function.

Item Type:Thesis (Dissertation (Ph.D.))
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Electrical Engineering
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Goodman, Rodney M.
Thesis Committee:
  • Goodman, Rodney M. (chair)
  • Franklin, Joel N.
Defense Date:26 August 1992
Record Number:CaltechETD:etd-08302007-155345
Persistent URL:
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:3289
Deposited By: Imported from ETD-db
Deposited On:31 Aug 2007
Last Modified:16 Apr 2021 23:19

Thesis Files

PDF (Miller_jw_1993.pdf) - Final Version
See Usage Policy.


Repository Staff Only: item control page