Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Embedded Chinese-English mixed voice recognition method and system for non-specific people

A non-specific person, mixed voice technology, applied in voice recognition, voice analysis, instruments, etc., can solve problems such as voice recognition cannot be realized, and achieve the effect of low algorithm pressure, high recognition rate, and small storage space

Inactive Publication Date: 2009-12-16
北京森博克智能科技有限公司
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under such circumstances, the current Chinese speech recognition technology of a single language cannot realize the speech recognition of mixed Chinese and English, and the mixed recognition of Chinese and English is the development trend of Chinese speech recognition technology in the future.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Embedded Chinese-English mixed voice recognition method and system for non-specific people
  • Embedded Chinese-English mixed voice recognition method and system for non-specific people
  • Embedded Chinese-English mixed voice recognition method and system for non-specific people

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be further described below in conjunction with the accompanying drawings.

[0054] figure 1 It is a schematic diagram of the framework of the present invention, image 3 It is a schematic flow chart of the system of the present invention, such as figure 1 and image 3 As shown, this system is mainly composed of four parts: S1-acoustic model training, S2-word tree generation, S3-front-end processing, and S4-recognition and decoding. The system flow is as follows:

[0055] The S1-acoustic model training part of the process is as follows:

[0056] 1. S1-1, feature extraction. According to the frame length of 25 milliseconds and the frame shift of 10 milliseconds, the 12-dimensional MFCC features are extracted, and the 1-dimensional energy features are added to form a total of 13-dimensional static features. The dynamic features take the first-order and second-order difference features to obtain a 39-dimensional acoustic feature vector sequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an embedded, Chinese-English mixed language supporting, embedded application oriented voice recognition method and system for non-specific people. The invention adopts acoustical model trained by mass voice data, acoustical modeling unit set compatible with Chinese and English pronouncing mode, so as to implement Chinese-English mixed voice recognition for non-specific people. According to the invention, a plurality of background models are adopted, a Gauss mixed model (GMM) parameter is obtained by an average adaptive training executed by the background models, then a vector quantization to the difference between the average of the Gauss mixed model and the average of the background models, and the model parameters are compressed. In the recognition stage, rapid Gauss selection, acoustic score pre-calculation, and a simplified GMM model are used, so that the amount of recognition calculation and storage space of the models are greatly reduced, and the voice recognition method and system is applicable on various kinds of embedded application systems.

Description

technical field [0001] The invention relates to the technical field of automatic speech recognition, and is a non-specific person-oriented, embedded application environment with limited computing and storage resources, and a speech recognition method and system supporting Chinese and English mixed languages. Background technique [0002] Speech is the most natural and convenient way for human beings to communicate and obtain information. Intelligent voice interaction technology mainly includes speech recognition technology, speech synthesis technology, voice evaluation technology, etc. Intelligent voice interaction will be a breakthrough change in the human-computer interaction mode after the graphical interaction mode (GUI). [0003] Speech recognition technology is a technology that allows machines to understand human speech, and automatically converts voice signals into text and related information through machines. It is a very important and critical part of intelligent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/26G10L15/14
Inventor 王辉
Owner 北京森博克智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products