Embedded Chinese-English mixed voice recognition method and system for non-specific people

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A non-specific person, mixed voice technology, applied in voice recognition, voice analysis, instruments, etc., can solve problems such as voice recognition cannot be realized, and achieve the effect of low algorithm pressure, high recognition rate, and small storage space

Inactive Publication Date: 2009-12-16

北京森博克智能科技有限公司

View PDF0 Cites 21 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Under such circumstances, the current Chinese speech recognition technology of a single language cannot realize the speech recognition of mixed Chinese and English, and the mixed recognition of Chinese and English is the development trend of Chinese speech recognition technology in the future.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] The present invention will be further described below in conjunction with the accompanying drawings.

[0054] figure 1 It is a schematic diagram of the framework of the present invention, image 3 It is a schematic flow chart of the system of the present invention, such as figure 1 and image 3 As shown, this system is mainly composed of four parts: S1-acoustic model training, S2-word tree generation, S3-front-end processing, and S4-recognition and decoding. The system flow is as follows:

[0055] The S1-acoustic model training part of the process is as follows:

[0056] 1. S1-1, feature extraction. According to the frame length of 25 milliseconds and the frame shift of 10 milliseconds, the 12-dimensional MFCC features are extracted, and the 1-dimensional energy features are added to form a total of 13-dimensional static features. The dynamic features take the first-order and second-order difference features to obtain a 39-dimensional acoustic feature vector sequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an embedded, Chinese-English mixed language supporting, embedded application oriented voice recognition method and system for non-specific people. The invention adopts acoustical model trained by mass voice data, acoustical modeling unit set compatible with Chinese and English pronouncing mode, so as to implement Chinese-English mixed voice recognition for non-specific people. According to the invention, a plurality of background models are adopted, a Gauss mixed model (GMM) parameter is obtained by an average adaptive training executed by the background models, then a vector quantization to the difference between the average of the Gauss mixed model and the average of the background models, and the model parameters are compressed. In the recognition stage, rapid Gauss selection, acoustic score pre-calculation, and a simplified GMM model are used, so that the amount of recognition calculation and storage space of the models are greatly reduced, and the voice recognition method and system is applicable on various kinds of embedded application systems.

Description

technical field [0001] The invention relates to the technical field of automatic speech recognition, and is a non-specific person-oriented, embedded application environment with limited computing and storage resources, and a speech recognition method and system supporting Chinese and English mixed languages. Background technique [0002] Speech is the most natural and convenient way for human beings to communicate and obtain information. Intelligent voice interaction technology mainly includes speech recognition technology, speech synthesis technology, voice evaluation technology, etc. Intelligent voice interaction will be a breakthrough change in the human-computer interaction mode after the graphical interaction mode (GUI). [0003] Speech recognition technology is a technology that allows machines to understand human speech, and automatically converts voice signals into text and related information through machines. It is a very important and critical part of intelligent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/26G10L15/14

Inventor王辉

Owner北京森博克智能科技有限公司

Embedded Chinese-English mixed voice recognition method and system for non-specific people

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology