Swaminathan, Kumar (1986) Analysis and demonstration of the quantile vocoder. Dissertation (Ph.D.), California Institute of Technology. http://resolver.caltech.edu/CaltechETD:etd-07192006-155419
A new scheme for speech compression is proposed, implemented and evaluated in this thesis. In this new scheme, the spectral envelope of the power spectral density of a speech frame is encoded using quartiles or order statistics. The perceptually important features of the spectral envelope are its peaks which correspond to the formant frequencies. The shape of the spectral envelope near the formants can be encoded by a careful choice of the quantiles and quantile orders. Algorithms to choose such a set of quantiles and quantile orders are described. It turns out that this can be done using very few quantiles. Data compression is achieved chiefly this way.
The quantile decoding algorithm estimates the spectral envelope from the quartiles and quantile orders. The first step is to set up a flat spectral density approximation. In this approximation, the spectral envelope is assumed to be constant every interquantile range. This constant value is simply the average power (i.e., ratio of the difference in quantile orders to the difference in quantiles) in that interquantile range. It is shown that the flat spectral density approximation is the maximum entropy solution to the decoding problem. The flat spectral density approximation is then smoothed by fitting an all-pole or autoregressive model. Algorithms to determine the parameters of the autoregressive model are described. These algorithms involve the solution of a system of linear equations, which has a "Toeplitz plus Hankel" structure, followed by a standard spectral factorization. The algorithms can easily be extended to pole-zero models as well.
The information about the spectral fine structure is sent through the parameters of the excitation model. A multi-pulse excitation model in cascade with a pitch predictor model has been chosen for this purpose. The theory of the multi-pulse model is reviewed, and algorithms to estimate the parameters of the multi-pulse model as well as the pitch predictor model are presented.
Quantization and encoding schemes of various transmission parameters are described. For high and medium bit rate applications, the parameters that need to be transmitted every frame are the quantiles, quantile orders, locations and amplitudes of the excitation pulses, parameters of the pitch predictor model and a gain term. For low bit rate applications, the quantile orders are fixed and so need not be transmitted. The quantization schemes for the quantile orders and for the gain term are shown to be optimal in the sense of minimizing the maximum spectral deviation due to quantization.
The quantile vocoder has been implemented in software at 4.8, 9.6, 16 and 24 Kbits/s. In order to test the vocoder, a speech data base of ten sentences spoken by one male and one female speaker has been used. The so-called segmental signal-to-noise ratio has been used as an objective performance measure to evaluate the vocoder at all bit rates. A subjective method for assessing the quality of the vocoder at various bit rates is also proposed and carried out. The results of the nonreal time quantile vocoder simulations at 4.8, 9.6, 16 and 24 Kbits/s have been recorded and will be played at the end of the talk. The quantile vocoder does indeed seem equivalent to or better than other vocoders at the same bit rates, according to informal listening tests.
|Item Type:||Thesis (Dissertation (Ph.D.))|
|Degree Grantor:||California Institute of Technology|
|Major Option:||Electrical Engineering|
|Thesis Availability:||Public (worldwide access)|
|Defense Date:||29 July 1985|
|Default Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Imported from ETD-db|
|Deposited On:||19 Jul 2006|
|Last Modified:||26 Dec 2012 02:55|
- Final Version
See Usage Policy.
Repository Staff Only: item control page