𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A neural fuzzy training approach for improving speech recognition

✍ Scribed by Yasuhiro Komori; Shigeki Sagayama; Alexander H. Waibel


Publisher
John Wiley and Sons
Year
1993
Tongue
English
Weight
951 KB
Volume
24
Category
Article
ISSN
0882-1666

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

This paper proposes a new training method for the phoneme identification neural network called β€œneural fuzzy training.” In the proposed training, nondeterministic (fuzzy) class information is assigned to the training signal, in contrast to the traditional method where a deterministic class information is assigned.

This study aims at the realization of a robust neural network, thereby improving the cumulative recognition rate of the phoneme identification and avoiding overtraining. The proposed neural fuzzy training is realized by backpropagation. In the conventional training, a deterministic phoneme class information is assigned to the training signal of the neural network as the value 1 or 0. However, in the proposed training, the fuzzy class information is assigned to the training signal for each training sample as the likelihood value between 0 and 1.

In the proposed training method, the likelihood is calculated by the monotonically decreasing function (such as exp(βˆ’Ξ± Β· d^2^)) of the distance between the training sample and the closest sample belonging to each phoneme class. The proposed neural fuzzy training method has a problem in that a large amount of computation cost is required since the training signal is determined by calculating the distances to all training samples. To solve this problem, the representative samples in each phoneme class are defined and the likelihood to the phoneme classes are determined by calculating the distance between the representative sample and the training sample.

By this simplification of the likelihood calculation, the computational cost to determine the training signal is reduced considerably. To demonstrate the usefulness of the neural fuzzy training, an experiment is conducted: /b, d, g, m, n, N/ identification, 18 consonant identification and phrase recognition using TDNN‐LR. The ATR database is used in the experiment. In the phoneme identification experiment, the speech samples which are extracted using the hand‐label is used. The TDNN is trained using speed samples uttered in word style, and the evaluation is performed using speech samples uttered in phrase style and in sentence style.

In the phrase recognition experiment using TDNN‐LR, the TDNN is trained using speed samples uttered word style using a hand label. The evaluation is performed using speech samples uttered in phrase style. In either experiment, an improvement of using the fuzzy training can be observed. Especially, in the phrase recognition experiment using TDNN‐LR, the top recognition rate is improved from 71.2 percent to 80.9 percent, and the top 5th recognition rate is improved from 92.8 percent to 96.O percent. Furthermore, it appeared also that the neural fuzzy training is a high‐speed training method.


πŸ“œ SIMILAR VOLUMES


A neuro-fuzzy approach to speech recogni
✍ Mu-Chun Su; Ching-Tang Hsieh; Chieh-Ching Chin πŸ“‚ Article πŸ“… 1998 πŸ› Elsevier Science 🌐 English βš– 622 KB

Several successful approaches to speech recognition have been proposed. Most of them involve time alignment which requires substantial computation and considerable memory storage. In this paper, we present a neuro-fuzzy approach to speech recognition without time alignment. This approach is a powerf

An improved maximum model distance appro
✍ Q.H He; S Kwong; K.F Man; K.S Tang πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 143 KB

This paper proposes an improved maximum model distance (IMMD) approach for HMM-based speech recognition systems based on our previous work [S. Kwong, Q.H. He, K.F. Man, K.S. Tang. A maximum model distance approach for HMM-based speech recognition, Pattern Recognition 31 (3) (1998) 219}229]. It de"ne

Improvement of noisy speech recognition
✍ Wei-Wen Hung; Hsiao-Chuan Wang πŸ“‚ Article πŸ“… 1998 πŸ› Elsevier Science 🌐 English βš– 670 KB

Modelling the state duration of hidden Markov models (HMMs) can effectively improve the accuracy in decoding the state sequence of an utterance and result in an improvement of speech recognition accuracy. However, when a speech signal is contaminated by ambient noise, the decoded state sequence may