Voice decoding method based on mixed network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of confusing network and speech decoding, applied in the field of speech decoding based on confusing network, to reduce the workload, reduce the network, and improve the decoding rate.

Active Publication Date: 2006-05-17

INST OF ACOUSTICS CHINESE ACAD OF SCI +1

View PDF0 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] The purpose of the present invention is: to overcome the deficiencies of the prior art, in the late stage of multi-pass decoding, without using more information (that is, without using more sophisticated and complex acoustic models and language models), by confusing network clustering technology Reduce the decoding error rate and increase the decoding rate, thus providing a speech decoding method based on confusion network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] The present invention will be further described below in conjunction with the accompanying drawings and preferred embodiments.

[0036] Such as image 3 As shown, the speech decoding method based on confusion network provided by the present invention comprises the following steps:

[0037] Step 101: Extract feature vector sequences from the input speech signal.

[0038] Step 102: Use the Viterbi-Beam search algorithm to decode the speech features for the first time, output the N-Best sentence or word lattice, and simultaneously obtain the acoustic layer probability score and language layer probability of each word in the N-Best sentence or word lattice Score.

[0039] Step 103: If the intermediate result of output in step 102 is NBest sentence, then it is compressed into directed network structure with merging algorithm, the flow process of this merging algorithm is as follows Figure 4 As shown, it is a prior art, so it will not be described in detail here. If the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A method for decoding voice based on confusion network includes carrying out deep priority frame synchronous Viterbi ¿C Beam search on voice character and outputting N ¿C Best sentence, generating confusion network by carrying out two stage cluster for N¿C best sentence according to time and phoneme similarity algorithm, matching and searching out optimum result on confusion network by using posterior probability maximum as criterion.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to a speech decoding method based on a confusion network. Background technique [0002] The decoding process, also known as the recognition process, is an important part of the speech recognition system. Its function is: under the given acoustic model and language model, for the input acoustic feature vector sequence, automatically search for the optimal matching word string from a certain search space, and finally convert the speech signal into text information . [0003] figure 1 It is a structural diagram of a known speech recognition system. As shown in the figure, the feature extraction module processes the input speech signal in frames, usually with a frame length of 20ms and a frame shift of 10ms; commonly used features include MFCC features, LPC features and PLP features. After feature extraction, the speech signal is transformed into a sequence of feature v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/26G10L15/02G10L15/08

Inventor吕萍颜永红潘接林韩疆

OwnerINST OF ACOUSTICS CHINESE ACAD OF SCI

Voice decoding method based on mixed network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology