Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Training method for embedded automatic sound identification system

A technology of automatic speech recognition and training method, applied in speech recognition, speech analysis, voice input/output, etc., can solve the problems of similar parts distinction, recognition rate decline, and key parts not getting enough attention.

Inactive Publication Date: 2005-03-02
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are certain limitations in this training method: since each reference template is generated by the word's own training sentence, the possible similar parts in the pronunciation of different words are not distinguished, so when the recognition is compared, the pronunciation is different from other words. The key part of the pronunciation difference has not been paid enough attention, and it is difficult to meet the requirements of high recognition rate
Especially when there are words with confusing pronunciation, the recognition rate will drop greatly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method for embedded automatic sound identification system
  • Training method for embedded automatic sound identification system
  • Training method for embedded automatic sound identification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0016] 1. Improved MSVQ template

[0017] Assuming that the frame length is T, the speech signal is usually represented by a feature vector sequence: X={x 1 , x 2 ,...,x T}. The segmentation method is based on a minimum distortion criterion in order to aggregate those frames that are most related into one segment. In addition, the total number of segments N s Related to the number of syllables contained in the word, each syllable in Chinese is usually composed of 3 to 4 phonemes (here, each syllable is divided into 3 segments, and each phoneme corresponds to a segment). First define the boundary as t l and t l+1 Intra-segment distortion D of segment l of -1 l for:

[0018] D l = Σ t = t l t l + 1 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is a training method used in embedded voice automatic-recognizing system in the field of intelligent information processing: (1) modified multistage vector quantization template training: using dynamic time regulating method to divide training sentences of a kind into several voice segments with regard to time, gathering the most related voice frames in a voice segment, and setting the total number of voice segments of a template with regard to syllabic number in a command word to be recognized, according to voice time sequence characteristic and statistical characteristics of various voice segments as well as Chinese syllabic structure; (2) generalization probability reduction distinguishing training: combining with the multistage vector quantization template, embedding the distinguishing training algorithm in a recognizer based on dynamic time regulating method, making distinguishing training on a reference template set based on a training set by defining the distance between training sentence and reference template as a distinguishing function, strengthening the ability to distinguish between templates by multiple repeated distinguishing training, and obtaining a more optimized voice template.

Description

technical field [0001] The invention relates to a training method for a voice recognition system in the technical field of intelligent information processing, in particular to a training method for an embedded automatic voice recognition system. Background technique [0002] The speech model (or template) used in the speech recognition system needs to reasonably reflect the acoustic characteristics of speech, and the probability distribution that effectively describes the speech feature space determines the performance of speech recognition. In order to be suitable for miniaturization and portable applications, most embedded automatic speech recognition systems are realized by special hardware systems, such as MCU, DSP and special chips for speech recognition. Due to the limited system resources and the requirements of real-time and reliability of recognition, the storage space occupied by the template of each recognition unit must be as small as possible, and the quality of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F3/16G10L15/02G10L15/06
Inventor 朱杰蔡铁
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products