End-to-end speech recognition model based on multi-level identification and modeling method
Patent Information
- Authority / Receiving Office
- CN · China
- Current Assignee / Owner
- UNIV OF SCI & TECH OF CHINA
- Publication Date
- 2021-07-23
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition model and modeling method based on multi-level identification. Background technique
[0002] End-to-End (E2E) Automatic Speech Recognition (ASR) based on the encoding-decoding framework directly models the sequence mapping relationship between the input audio sequence and the output text. The advantages of a concise framework and no need for linguistic background knowledge make this structure gradually sought after by academia and industry.
[0003] In end-to-end ASR, input speech sequences can be mapped to text sequences at different levels. The mapping relationship between speech sequences and text sequences is one-to-many. In Chinese ASR, the text sequence can be composed of pinyin and Chinese characters; the English and Chinese text sequence can be composed of words (word) and characters (character).
[0004] In general, modeling...