End-to-end speech recognition model based on multi-level identification and modeling method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speech recognition model and modeling method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of non-pronunciation, uneven distribution of samples, and inability to consider the synergistic pronunciation of speech, so as to achieve the effect of improving the accuracy rate

Pending Publication Date: 2021-07-23

UNIV OF SCI & TECH OF CHINA

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Its shortcomings lie in the large amount of training data required and the uneven distribution of samples.

However, the construction of character-level text units does not take into account the influence between adjacent units in the output text sequence, and cannot take into account issues such as co-pronunciation and non-pronunciation of speech.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036]Picking one of the multi-level text sequences for end-to-end speech recognition modeling is not the only choice, let alone the optimal choice. The multiple text sequences selected in the end-to-end modeling of speech recognition are recorded as multiple-granularity target sequences. The present invention considers that selecting multiple text sequences together for end-to-end speech recognition modeling can achieve better results, and proposes a multi-granularity sequence alignment method (Multi-Granularity Sequence Alignment, MGSA).

[0037] The end-to-end ASR system as a whole can be divided into two parts: the model training stage (training stage) and the decoding inference stage (inference stage). The MGSA method proposed in this patent will respectively use multi-level identification information to optimize the ASR system in these two stages. First, in the model structure, the end-to-end ASR decoder module will sequentially generate multi-level text sequences, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an end-to-end speech recognition modeling method based on multi-level identification, and the method comprises the steps: decoding inference which employs a post-inference algorithm; the post-inference algorithm comprises the steps that a model corresponding to a fine-grained text sequence generates a posterior probability output sequence, the output sequence can uniquely correspond to a coarse-grained sub-sequence calculation model to generate a log-likelihood value of the coarse-grained sub-sequence, and the log-likelihood value serves as cross validation of an existing prediction output sequence; and according to the likelihood probability scores obtained through calculation in the two steps, an existing decoding path is cut, and it is guaranteed that the search path is controlled within the beam width range.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition model and modeling method based on multi-level identification. Background technique [0002] End-to-End (E2E) Automatic Speech Recognition (ASR) based on the encoding-decoding framework directly models the sequence mapping relationship between the input audio sequence and the output text. The advantages of a concise framework and no need for linguistic background knowledge make this structure gradually sought after by academia and industry. [0003] In end-to-end ASR, input speech sequences can be mapped to text sequences at different levels. The mapping relationship between speech sequences and text sequences is one-to-many. In Chinese ASR, the text sequence can be composed of pinyin and Chinese characters; the English and Chinese text sequence can be composed of words (word) and characters (character). [0004] In general, modeling...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/06G10L15/16G10L15/22G10L15/26

CPCG10L15/063G10L15/16G10L15/22G10L15/26

Inventor唐健胡宇晨戴礼荣

OwnerUNIV OF SCI & TECH OF CHINA

End-to-end speech recognition model based on multi-level identification and modeling method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology