Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

End-to-end speech recognition model based on multi-level identification and modeling method

A speech recognition model and modeling method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of non-pronunciation, uneven distribution of samples, and inability to consider the synergistic pronunciation of speech, so as to achieve the effect of improving the accuracy rate

Pending Publication Date: 2021-07-23
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Its shortcomings lie in the large amount of training data required and the uneven distribution of samples.
However, the construction of character-level text units does not take into account the influence between adjacent units in the output text sequence, and cannot take into account issues such as co-pronunciation and non-pronunciation of speech.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end speech recognition model based on multi-level identification and modeling method
  • End-to-end speech recognition model based on multi-level identification and modeling method
  • End-to-end speech recognition model based on multi-level identification and modeling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036]Picking one of the multi-level text sequences for end-to-end speech recognition modeling is not the only choice, let alone the optimal choice. The multiple text sequences selected in the end-to-end modeling of speech recognition are recorded as multiple-granularity target sequences. The present invention considers that selecting multiple text sequences together for end-to-end speech recognition modeling can achieve better results, and proposes a multi-granularity sequence alignment method (Multi-Granularity Sequence Alignment, MGSA).

[0037] The end-to-end ASR system as a whole can be divided into two parts: the model training stage (training stage) and the decoding inference stage (inference stage). The MGSA method proposed in this patent will respectively use multi-level identification information to optimize the ASR system in these two stages. First, in the model structure, the end-to-end ASR decoder module will sequentially generate multi-level text sequences, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an end-to-end speech recognition modeling method based on multi-level identification, and the method comprises the steps: decoding inference which employs a post-inference algorithm; the post-inference algorithm comprises the steps that a model corresponding to a fine-grained text sequence generates a posterior probability output sequence, the output sequence can uniquely correspond to a coarse-grained sub-sequence calculation model to generate a log-likelihood value of the coarse-grained sub-sequence, and the log-likelihood value serves as cross validation of an existing prediction output sequence; and according to the likelihood probability scores obtained through calculation in the two steps, an existing decoding path is cut, and it is guaranteed that the search path is controlled within the beam width range.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition model and modeling method based on multi-level identification. Background technique [0002] End-to-End (E2E) Automatic Speech Recognition (ASR) based on the encoding-decoding framework directly models the sequence mapping relationship between the input audio sequence and the output text. The advantages of a concise framework and no need for linguistic background knowledge make this structure gradually sought after by academia and industry. [0003] In end-to-end ASR, input speech sequences can be mapped to text sequences at different levels. The mapping relationship between speech sequences and text sequences is one-to-many. In Chinese ASR, the text sequence can be composed of pinyin and Chinese characters; the English and Chinese text sequence can be composed of words (word) and characters (character). [0004] In general, modeling...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L15/16G10L15/22G10L15/26
CPCG10L15/063G10L15/16G10L15/22G10L15/26
Inventor 唐健胡宇晨戴礼荣
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products