Selection method and system of recognition unit for Uygur language voice recognition

A technology of speech recognition and Uyghur language, which is applied in speech recognition, speech analysis, instruments, etc., can solve the problems of speech recognition performance degradation, affecting the prediction ability of language model, and the recognition device cannot correctly recognize out-of-set words, so as to improve speech recognition rate effect

Active Publication Date: 2013-04-24
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The recognizer cannot correctly recognize out-of-set words, too many of them will seriously affect the predictive ability of the language model, which will directly lead to the decline of speech recognition performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Selection method and system of recognition unit for Uygur language voice recognition
  • Selection method and system of recognition unit for Uygur language voice recognition
  • Selection method and system of recognition unit for Uygur language voice recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

[0027] In order to solve the problem of too many out-of-set words in the word-based Uyghur speech recognition system, the embodiment of the present invention realizes the form based on Finite State Transducer (FST) according to the combination rules of Uyghur word stems and additional components The parser and the tail-cutting algorithm can decompose the word w into the form of the stem o continuously combined with the additional component k, and the stem and the additional components are separated from each other by spaces. For example, the word ishchilirimizgha (to our workers) breaks down to ishchi-lir-imiz-gha. The symbol "-" is added before the initial letter of the additional component as a mark to distinguish the stem from the additional component. We record the stem and additional components as u, and the sequ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a selection method and a system of a recognition unit for Uygur language voice recognition. The method includes: corresponding text corpora are collected or prepared for to-be-recognized voice; different terms are picked out from the text corpora; the different terms are input into a morphological analyzer, corresponding term splitting results are obtained if analysis is successful, term splitting based on a tail dropping algorithm is carried out on the terms if the analysis is unsuccessful so as to obtain the splitting results, and a corresponding stem and supplementary elements of each term are obtained according to the splitting results; and the terms in the text corpora are mapped into the stems and the supplementary elements, and the high-frequency stems and supplementary elements are picked out to be used as a dictionary unit. According to the selection method and the system of the recognition unit for the Uygur language voice recognition, the Uygur language terms are split into stems and supplementary elements according to morphological change rules of Uygur language, the stems and the supplementary elements are selected to be used as the recognition unit, and therefore the problem that excessive foreign words are collected in the recognition system is solved, and recognition rates of the system are improved.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a method and system for selecting a recognition unit for Uyghur speech recognition. Background technique [0002] The goal of speech recognition is to automatically map acoustic signals into word sequences. figure 1 It is a block diagram of an existing statistical speech recognition system. exist figure 1 in, x 1 ...x T is the acoustic feature sequence from time 1 to time T, w 1 …w N is the recognition unit sequence, and the recognizer uses the information provided by the acoustic model and the speech model to determine the best recognition unit sequence according to Bayesian decision theory [w 1 ...w N ] opt , making [0003] [ w 1 . . . w N ] opt ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/28
Inventor 潘接林李鑫颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products