End-to-end speech recognition system based on deep learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech recognition and deep learning technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of assisting current tasks, long training time, inability to use historical information, etc., to achieve strong semantic information mining ability, and speed up training. , the effect of reducing training time

Pending Publication Date: 2020-04-24

天津中科智能识别有限公司

View PDF3 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The above two solutions have certain shortcomings. For the former, the model cannot use the context information of each frame, that is, it cannot use historical information to assist the current task.

In addition, the model assumes a Gaussian distribution between the frame and the state. Although the model is simplified, it has great limitations.

For the latter, although the model can achieve a better convergence effect, due to the cyclic structure of the RNN itself, more RNN units make the training time longer and difficult to parallelize

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0022] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0023] Such as figure 1 As shown, the present invention is based on the end-to-end speech recognition system of deep learning, including:

[0024] The acoustic model includes the VGG-Net layer, the first fully connected layer, the bidirectional RNN layer, the second fully connected layer, the Softmax layer and the CTC layer in turn, which are used to extract the two-dimensional FBank features of the audio, and pass through the VGG-Net layer , the first fully connected layer, the bidirectional RNN layer, the second fully connected layer, the Softmax layer, and the CTC layer, after processing, the normalized probability distribution of each time step is obtained; and then the entropy v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an end-to-end speech recognition system based on deep learning. The system includes: an acoustic model, which sequentially comprises a VGG-Net layer, a first full connection layer, a bidirectional RNN layer, a second full connection layer, a Softmax layer and a CTC layer, and is used for extracting two-dimensional FBank features of an audio, performing network processing toobtain probability distribution of each time step, and outputting a candidate pinyin sequence according to an entropy result of the probability distribution of time steps; a language model, which isconnected to the acoustic model and comprises a Transformer encoder and an n-gram model; wherein the Transformer encoder is used for outputting a Chinese character sequence with a same length according to the input candidate pinyin sequence, and the n-gram model is used for processing the output Chinese character sequence and selecting a target Chinese character text for output. According to the method, a final recognition result most conforming to a current context and human expression habits can be obtained.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to an end-to-end speech recognition system based on deep learning. Background technique [0002] Speech recognition is used to convert speech into corresponding text, and generally includes two basic modules: an acoustic module and a language module. For the input speech signal, the acoustic module is responsible for extracting the characteristics of the signal and calculating the probability of speech to syllable (or other smallest unit), while the language module uses the language model to convert the smallest unit into a complete human or computer understandable natural language. [0003] There are currently two types of speech recognition, one is the probabilistic model method, and the other is the deep learning method. For the former, the most typical is the speech recognition model (HMM-GMM) based on hidden Markov model (HMM) and mixed Gaussian distribution (GMM)....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/00G10L15/02G10L15/06G10L15/18G10L15/26G10L25/69

CPCG10L15/02G10L15/063G10L15/005G10L15/1815G10L15/26G10L25/69

Inventor 曹琉张大朋孙哲南张森

Owner 天津中科智能识别有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

End-to-end speech recognition system based on deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology