Citation
Swaminathan, Kumar (1986) Analysis and demonstration of the quantile vocoder. Dissertation (Ph.D.), California Institute of Technology. http://resolver.caltech.edu/CaltechETD:etd07192006155419
Abstract
A new scheme for speech compression is proposed, implemented and evaluated in this thesis. In this new scheme, the spectral envelope of the power spectral density of a speech frame is encoded using quartiles or order statistics. The perceptually important features of the spectral envelope are its peaks which correspond to the formant frequencies. The shape of the spectral envelope near the formants can be encoded by a careful choice of the quantiles and quantile orders. Algorithms to choose such a set of quantiles and quantile orders are described. It turns out that this can be done using very few quantiles. Data compression is achieved chiefly this way.
The quantile decoding algorithm estimates the spectral envelope from the quartiles and quantile orders. The first step is to set up a flat spectral density approximation. In this approximation, the spectral envelope is assumed to be constant every interquantile range. This constant value is simply the average power (i.e., ratio of the difference in quantile orders to the difference in quantiles) in that interquantile range. It is shown that the flat spectral density approximation is the maximum entropy solution to the decoding problem. The flat spectral density approximation is then smoothed by fitting an allpole or autoregressive model. Algorithms to determine the parameters of the autoregressive model are described. These algorithms involve the solution of a system of linear equations, which has a "Toeplitz plus Hankel" structure, followed by a standard spectral factorization. The algorithms can easily be extended to polezero models as well.
The information about the spectral fine structure is sent through the parameters of the excitation model. A multipulse excitation model in cascade with a pitch predictor model has been chosen for this purpose. The theory of the multipulse model is reviewed, and algorithms to estimate the parameters of the multipulse model as well as the pitch predictor model are presented.
Quantization and encoding schemes of various transmission parameters are described. For high and medium bit rate applications, the parameters that need to be transmitted every frame are the quantiles, quantile orders, locations and amplitudes of the excitation pulses, parameters of the pitch predictor model and a gain term. For low bit rate applications, the quantile orders are fixed and so need not be transmitted. The quantization schemes for the quantile orders and for the gain term are shown to be optimal in the sense of minimizing the maximum spectral deviation due to quantization.
The quantile vocoder has been implemented in software at 4.8, 9.6, 16 and 24 Kbits/s. In order to test the vocoder, a speech data base of ten sentences spoken by one male and one female speaker has been used. The socalled segmental signaltonoise ratio has been used as an objective performance measure to evaluate the vocoder at all bit rates. A subjective method for assessing the quality of the vocoder at various bit rates is also proposed and carried out. The results of the nonreal time quantile vocoder simulations at 4.8, 9.6, 16 and 24 Kbits/s have been recorded and will be played at the end of the talk. The quantile vocoder does indeed seem equivalent to or better than other vocoders at the same bit rates, according to informal listening tests.
Item Type:  Thesis (Dissertation (Ph.D.)) 

Degree Grantor:  California Institute of Technology 
Major Option:  Electrical Engineering 
Thesis Availability:  Public (worldwide access) 
Thesis Committee: 

Defense Date:  29 July 1985 
Record Number:  CaltechETD:etd07192006155419 
Persistent URL:  http://resolver.caltech.edu/CaltechETD:etd07192006155419 
Default Usage Policy:  No commercial reproduction, distribution, display or performance rights in this work are provided. 
ID Code:  2929 
Collection:  CaltechTHESIS 
Deposited By:  Imported from ETDdb 
Deposited On:  19 Jul 2006 
Last Modified:  26 Dec 2012 02:55 
Thesis Files

PDF (Swaminathan_k_1986.pdf)
 Final Version
See Usage Policy. 9Mb 
Repository Staff Only: item control page