Tuesday, 25 April 2017

Paper Review: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences

Title: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences

Author: Steven B. Davis, Paul Mermelstein


Review:

Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary
included many phonetically similar monosyllabic words.Therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.The results are limited by the restrictions on the speech data examined. In particular, consonant clusters, multisyllabic words, and unstressed monosyllabic words have not been studied in the paper.The principal conclusion of the study is that perceptually based word templates are effective in capturing the acoustic information required to recognize these words in continuous speech. 

Patent Review: Error correction in speech recognition

Application number: US08825534
Publication number: US6064959A
Date of patent: May 16,2000
Inventors: Jonathan Hood Young, David Wilsberg ParmenterRobert RothJoev DubachGregory J.GadboisStijn Van Even 

Aim:
Incorrect text associated with recognition errors in computer-implemented speech recognition is corrected by performing speech recognition on an utterance to produce a recognition result for the utterance. When a correction command is identified in the recognition result, corrected text is produced from a portion of the recognition result.

Review:
An utterance includes a variable number of frames and corresponds, for example, to a period of speech followed by a pause of at least a predetermined duration. The invention relates to correcting recognition errors in speech recognition. When the correction command indicates that the corrected speech is a pronunciation/spelling of a word to be corrected, the corrected text may be produced using confused pronunciation/spelling matching to identify text corresponding to the pronunciation. A confused pronunciation dictionary or a traditional pronunciation dictionary may be searched for confused pronunciation matches. A phonetic /spelling tree may be used to search for confused pronunciation matches.Instead of confused pronunciation and confused spelling searches other types of searches including simple pronunciation or spelling searches also may be performed. Speech recognition is performed on an utterance to produce recognition results and a spelling command is identified in the recognition results. The utterance is then processed and the spelling is produced by searching a dictionary using the results of the processing step.

Basic operations using DSP Processor

In this experiment, we performed basic operations like addition, subtraction, multiplication etc on DSP Processor C2000. Code composer studio was used as coding platform.Using the previously developed implementations of DSP algorithms in C language, the codes were modified to work on-chip in embedded C. 

FIR Filter design using frequency sampling method

In this method, we designed FIR Filter using frequency sampling method.Unlike the window method, this technique can be used for any given magnitude response.
Input Specifications given were:
1) Pass band attenuation
2) Stop band attenuation
3) Pass band frequency
4) Stop band frequency
5) Sampling frequency 
Frequency response is calculated using input specification.Now this frequency response is sampled at a set of equally spaced frequencies to obtain N samples. 
Thus, sampling the continuous frequency response Hd(w) at N points essentially gives us the N-point DFT of Hd(2(pi)kn/N). 
By using the IDFT formula, the filter co-efficients are calculated. Now using the above N-point filter response, the continuous frequency response is calculated as an interpolation of the sampled frequency response. The approximation error would then be exactly zero at the sampling frequencies and would be finite in frequencies between them. The smoother the frequency response being approximated, the smaller will be the error of interpolation between the sample points.

FIR Filter design using window function

In this experiment we designed linear phase FIR filter using window function. FIR filters have finite impulse response. The method for designing the filter is the same as IIR filters. 
Input given are:
1) Pass band attenuation
2) Stop band attenuation
3) Pass band frequency
4) Stop band frequency 
5) Sampling frequency 
Window function include Hamming window, Bartlett Window, Hanning window, Blackman window etc.We used a Hanning Window as the window function and wrote the code. 
The only difference between IIR and FIR designing is that much of the calculation is done in time domain rather than the transform domain.




Chebyshev IIR Filter Design

In this experiment,we designed digital chebyshev low pass filter and high pass filter from analog filter using Bilinear transform method.
Input specifications were given as follows:
1) Passband attenuation
2) Stopband attenuation
3) Passband frequency
4) Stopband frequency
5) Sampling frequency
We noted down values of pass band attenuation and stop band attenuation from the graphs obtained in Scilab.We observed that magnitude spectrum exhibits ripple in pass band and monotonic in stop band. Chebyshev's filter requires less hardware components for realisation. 

Butterworth filter design

In this experiment,we designed digital butterworth low pass filter and high pass filter. 
The input specifications given were:
1) Passband attenuation
2) Stopband attenuation
3) Passband frequency
4) Stopband frequency
5) Sampling frequency
We noted down values of pass band attenuation and stop band attenuation from graphs obtained in Scilab.Analog LPF poles lie on the left hand side of the s-plane hence the analog filter is stable. Also the digital LPF poles lie inside the unit circle hence the Digital LPF also is stable. Thus the magnitude spectrum is monotonic in both passband and stopband. 

Paper Review: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences

Title: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences Author: Steven B. Davi...