Automatic recognition of 200 words
โ Scribed by V.M. Velichko; N.G. Zagoruyko
- Book ID
- 104139518
- Publisher
- Elsevier Science
- Year
- 1970
- Weight
- 517 KB
- Volume
- 2
- Category
- Article
- ISSN
- 0020-7373
No coin nor oath required. For personal study only.
โฆ Synopsis
Experiments on the automatic recognition of 203 Russian words are described. The experimental vocabulary includes terms of the language, ALG-OL-60 together with others. The logarithmic characteristics of acoustic signal in five bands are extracted as features. The measure of similarity between the words of standard and control sequences is calculated by the words maximizing a definite functional using dynamic programming. The average reliability of recognition for one speaker obtained for experiments using 5000 words is 0"95. The computational time for recognition is 2--4 sec.
The problem of automatic recognition of a limited set of verbal commands is of interest to speech researchers in so much as it offers a practical solution of problems such as the vocal control of automatic equipment, vocal input of data and programs to computers and so on (Vysotskiy et aL, 1968). This paper describes experiments of the automatic recognition of 203 Russian words (Table 3). These include all the terms of the programming language ALGOL-60 together with some terms of the input language and elementary function names. The majority of the letter names of the Russian and Latin alphabets had been replaced by proper nouns (as is done in communication systems), for example "mu" is replaced by "mi" and "psi" by "psi-shtrikh".
The following feature-extraction system is used to describe the speech signal in the experimental system. The acoustic signal is transmitted through a system of five second-order filters (resonant circuits RLC). The central frequencies of filters were chosen equal to 225, 450, 900, 1800, 7200 Hz (Kurilov & Gavrilko, 1969). The filter quality factor is equal to 2-45. The word is broken up into segments of fixed duration each of 1.4 msec. The segment duration is chosen to exceed the maximum pitch period. In every segment the signal power is calculated in the commons band, Eo, and in the
๐ SIMILAR VOLUMES