Speech recognition optimization decoding method integrating guide probability

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of speech recognition and decoding method, which is applied in speech recognition, speech analysis, instruments, etc., and can solve the problems of lack of position information of the speech frame to be recognized, lack of partial local space enhanced search, etc.

Inactive Publication Date: 2013-03-20

INST OF AUTOMATION CHINESE ACAD OF SCI

View PDF4 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] The purpose of the present invention is to solve the shortcomings of the lack of using the position information of the speech frame to be recognized in the acoustic feature space and the lack of enhanced search for some local spaces in the existing speech recognition decoding technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0028] In order to add the position information of the speech frame in the acoustic feature space in the decoding process, the present invention is based on the response relationship between the statistical phoneme and each Gaussian component in the Universal Background Model (Universal Background Model, UBM), by establishing each Gaussian component in the UBM. The corresponding relationship between the components and the phonemes is used to obtain the position of the speech frame in the acoustic feature space, which is expressed as the probability that the speech frame belongs to different parts of the acoustic feature space, and the guidance probability is obtained. When decoding, use the position information of the spee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech recognition decoding method integrating guide probability and provides a guide probability model for overcoming the insufficiency that the traditional speech recognition system insufficiently utilizes position information of a speech frame in an acoustic characteristic space, and for the purpose of describing the probability that the speech frame belongs to different parts of the acoustic characteristic space and guiding a decoding process. The method comprises the following steps of training a universal background model for describing the whole acoustic characteristic space, computing a main gaussian component of the speech frame in the universal background model, utilizing an acoustic model of a recognition system to forcibly segment a training corpus, obtaining a phoneme of the speech frame, counting response frequency of the phoneme and main gaussian, normalizing the respond frequency, obtaining the guide probability, fusing the guide probability to total score computation of a speech recognition path, and guiding a decoder to enhance or weaken the path.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to the fields of acoustic modeling and decoding for speech recognition. Background technique [0002] At present, speech recognition systems generally use Hidden Markov Model as the basic model of acoustic modeling and decoding. In order to consider the impact of contextual pronunciation on speech units, triphone models are often used to improve the system recognition rate. But after considering the context, the number of models and the size of the parameters increase dramatically. Taking the Chinese large vocabulary continuous speech recognition system as an example, the basic phoneme set only contains 191 initials and tonal finals, while the corresponding triphone models total more than 200,000. Even after the parameter sharing of the model layer, state layer and Gaussian component layer, the scale of parameters is still huge. This will not only bring about insufficient parame...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/00G10L15/06G10L15/08

Inventor刘文举杨占磊

OwnerINST OF AUTOMATION CHINESE ACAD OF SCI

Speech recognition optimization decoding method integrating guide probability

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology