End-to-end speech recognition system based on deep learning

A speech recognition and deep learning technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of assisting current tasks, long training time, inability to use historical information, etc., to achieve strong semantic information mining ability, and speed up training. , the effect of reducing training time

Pending Publication Date: 2020-04-24
天津中科智能识别有限公司
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The above two solutions have certain shortcomings. For the former, the model cannot use the context information of each frame, that is, it cannot use historical information to assist the current task.
In addition, the model assumes a Gaussian distribution between the frame and the state. Although the model is simplified, it has great limitations.
For the latter, although the model can achieve a better convergence effect, due to the cyclic structure of the RNN itself, more RNN units make the training time longer and difficult to parallelize

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end speech recognition system based on deep learning
  • End-to-end speech recognition system based on deep learning
  • End-to-end speech recognition system based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0023] Such as figure 1 As shown, the present invention is based on the end-to-end speech recognition system of deep learning, including:

[0024] The acoustic model includes the VGG-Net layer, the first fully connected layer, the bidirectional RNN layer, the second fully connected layer, the Softmax layer and the CTC layer in turn, which are used to extract the two-dimensional FBank features of the audio, and pass through the VGG-Net layer , the first fully connected layer, the bidirectional RNN layer, the second fully connected layer, the Softmax layer, and the CTC layer, after processing, the normalized probability distribution of each time step is obtained; and then the entropy v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an end-to-end speech recognition system based on deep learning. The system includes: an acoustic model, which sequentially comprises a VGG-Net layer, a first full connection layer, a bidirectional RNN layer, a second full connection layer, a Softmax layer and a CTC layer, and is used for extracting two-dimensional FBank features of an audio, performing network processing toobtain probability distribution of each time step, and outputting a candidate pinyin sequence according to an entropy result of the probability distribution of time steps; a language model, which isconnected to the acoustic model and comprises a Transformer encoder and an n-gram model; wherein the Transformer encoder is used for outputting a Chinese character sequence with a same length according to the input candidate pinyin sequence, and the n-gram model is used for processing the output Chinese character sequence and selecting a target Chinese character text for output. According to the method, a final recognition result most conforming to a current context and human expression habits can be obtained.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition system based on deep learning. Background technique [0002] Speech recognition is used to convert speech into corresponding text, and generally includes two basic modules: an acoustic module and a language module. For the input speech signal, the acoustic module is responsible for extracting the characteristics of the signal and calculating the probability of speech to syllable (or other smallest unit), while the language module uses the language model to convert the smallest unit into a complete human or computer understandable natural language. [0003] There are currently two types of speech recognition, one is the probabilistic model method, and the other is the deep learning method. For the former, the most typical is the speech recognition model (HMM-GMM) based on hidden Markov model (HMM) and mixed Gaussian distribution (GMM)....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/00G10L15/02G10L15/06G10L15/18G10L15/26G10L25/69
CPCG10L15/02G10L15/063G10L15/005G10L15/1815G10L15/26G10L25/69
Inventor 曹琉张大朋孙哲南张森
Owner 天津中科智能识别有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products