End-to-end speech recognition model based on multi-level identification and modeling method

A speech recognition model and modeling method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of non-pronunciation, uneven distribution of samples, and inability to consider the synergistic pronunciation of speech, so as to achieve the effect of improving the accuracy rate
CN113160803APending Publication Date: 2021-07-23UNIV OF SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
UNIV OF SCI & TECH OF CHINA
Publication Date
2021-07-23

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides an end-to-end speech recognition modeling method based on multi-level identification, and the method comprises the steps: decoding inference which employs a post-inference algorithm; the post-inference algorithm comprises the steps that a model corresponding to a fine-grained text sequence generates a posterior probability output sequence, the output sequence can uniquely correspond to a coarse-grained sub-sequence calculation model to generate a log-likelihood value of the coarse-grained sub-sequence, and the log-likelihood value serves as cross validation of an existing prediction output sequence; and according to the likelihood probability scores obtained through calculation in the two steps, an existing decoding path is cut, and it is guaranteed that the search path is controlled within the beam width range.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition model and modeling method based on multi-level identification. Background technique

[0002] End-to-End (E2E) Automatic Speech Recognition (ASR) based on the encoding-decoding framework directly models the sequence mapping relationship between the input audio sequence and the output text. The advantages of a concise framework and no need for linguistic background knowledge make this structure gradually sought after by academia and industry.

[0003] In end-to-end ASR, input speech sequences can be mapped to text sequences at different levels. The mapping relationship between speech sequences and text sequences is one-to-many. In Chinese ASR, the text sequence can be composed of pinyin and Chinese characters; the English and Chinese text sequence can be composed of words (word) and characters (character).

[0004] In general, modeling...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More