Rose, Kenneth (1991) Deterministic annealing, clustering, and optimization. Dissertation (Ph.D.), California Institute of Technology. http://resolver.caltech.edu/CaltechETD:etd-07122007-085228
This work introduces the concept of deterministic annealing (DA) as a useful approach to clustering and other related optimization problems. It is strongly motivated by analogies to statistical physics, but is formally derived within information theory and probability theory. This approach enables escaping local optima that plague traditional techniques, without the extremely slow schedules typically required by stochastic methods. The clustering solutions obtained by DA are totally independent of the choice of initial configuration.
A probabilistic framework is constructed, which is based on the principle of maximum entropy. The association probabilities at a given average distortion are Gibbs distributions parametrized by the corresponding Lagrange multiplier [beta], which is inversely proportional to the temperature in the physical analogy. By computing marginal probabilities within the framework, an effective cost is obtained, which is minimized to find the most probable set of cluster representatives at a given temperature. This effective cost is the free energy in statistical mechanics, which is indeed optimized at isothermal, stochastic equilibrium.
Within the probabilistic framework, annealing is introduced by controlling the Lagrange multiplier [beta]. This annealing is interpreted as gradually reducing the "fuzziness" of the associations. Phase transitions are identified in the process, which are, in fact, cluster splits. A sequence of phase transitions produces a hierarchy of fuzzy-clustering solutions. Critical [beta] are computed exactly for the first phase transition and approximately for the following ones.
Specific algorithms are derivable from the general approach, to address different aspects of clustering in the large variety of application fields. Here, algorithms are derived, and simulation results are presented for the three major classes, namely, hard clustering, fuzzy clustering, and hierarchical clustering. From the experimental results it appears that DA is substantially superior to traditional techniques.
The last part of the work extends the approach to deal with a larger family of optimization problems that can be reformulated as constrained clustering. A probabilistic framework for constrained clustering is derived based on the principle of maximum entropy. It is shown that for our annealing purpose, the constraint can be directly applied to the free energy. Three examples of constrained clustering are discussed. Mass-constrained clustering is formulated and yields an improvement of the clustering procedure. The process is now independent of the number of representatives and their multiplicity in the clusters. Secondly, the travelling salesman problem (TSP) is reformulated as constrained clustering, yielding the elastic net approach. A second Lagrange multiplier is identified, which is used to obtain a more powerful annealing method. Finally, self-organization of neural networks is shown to be closely related to TSP, and a similar annealing method is suggested. A fuzzy solution is sought to obtain the optimal net for a given training data set.
|Item Type:||Thesis (Dissertation (Ph.D.))|
|Degree Grantor:||California Institute of Technology|
|Division:||Engineering and Applied Science|
|Major Option:||Electrical Engineering|
|Thesis Availability:||Public (worldwide access)|
|Defense Date:||6 December 1990|
|Default Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Imported from ETD-db|
|Deposited On:||31 Jul 2007|
|Last Modified:||26 Dec 2012 02:55|
- Final Version
See Usage Policy.
Repository Staff Only: item control page