Improved end-to-end speech recognition method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech recognition and speech technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of complex construction process, time-consuming, inability to accurately represent speech signal distribution, etc., achieve excellent robustness, improve recognition rate and training efficiency effect

Active Publication Date: 2020-04-21

THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP

View PDF5 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The traditional GMM-HMM-based speech recognition model has achieved good results, but because the GMM model is a shallow model that cannot accurately represent the distribution of speech signals, and the construction process of the HMM model is complex and requires alignment operations, etc., people proposed based on neural networks. An end-to-end speech recognition model is established, that is, the speech signal is directly mapped to the text sequence, without data mark alignment, pronunciation dictionary, etc., which simplifies the construction process and improves the recognition rate

[0003] At present, the end-to-end model is divided into CTC model and Seq-to-Seq model. The CTC model uses the deep neural network CNN or RNN to distribute the speech signal, which can accurately represent the feature distribution of the speech signal. Traditional speech recognition is for each One frame of data needs to know the corresponding label to carry out effective training. Before training, preprocessing of voice alignment is required, which requires repeated iterations, which is time-consuming.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] The present invention proposes an improved end-to-end speech recognition method, as attached figure 1 shown, including the following steps:

[0040] Step 1. Obtain the voice and its transcribed text data set, perform the feature extraction result of Mel Spectrum on the voice data as the input feature, and obtain the tag set and dictionary from the transcribed text.

[0041] Step 2. Build a model including a convolutional layer, a self-attention layer, and a fully connected layer. Use the CTC loss function as the loss function of the model, and use the backpropagation algorithm to update the model parameters.

[0042] Step 3. Using the trained model, the speech feature sequence is used as an input to obtain an output, and the output result is decoded to obtain a final result.

[0043] The following is a detailed description with reference to the illustrations.

[0044] Firstly, the speech feature data and its transcription text labels are obtained by using the speech a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an improved end-to-end speech recognition method, which combines a convolutional neural network and a self-attention mechanism and utilizes a CTC training criterion to train anend-to-end speech recognition model. The model is mainly composed of three parts: (1) a depth two-dimensional convolution part; (2) a self-attention part; (3) a full connection layer. The first part of the model effectively extracts features of a time axis and a frequency domain axis of a voice signal through two-dimensional convolution, and translation invariance is achieved; the second part enables the voice signal to be fully combined with context through the self-attention mechanism; the third part classifies features of each frame of voice; and finally model parameters are updated throughthe CTC training criterion. According to the model, the self-attention mechanism is innovatively added into a neural network-CTC framework, so that end-to-end speech recognition is realized, and therecognition effect is improved.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to an improved end-to-end speech recognition method. Background technique [0002] People's research on speech recognition technology began in the 1950s. The purpose was to receive human speech and let the machine understand human intentions. At first, people carried out simple recognition of isolated words and syllables. In the 1960s, speech recognition began. Systematic theory, after the emergence of computers, people switched from hardware to building simulation software for speech recognition. The algorithm of speech recognition has gone through the process of pattern matching algorithm represented by dynamic scaling algorithm, statistical model algorithm represented by hidden Markov, and now end-to-end speech recognition algorithm based on machine learning. The traditional GMM-HMM-based speech recognition model has achieved good results, but because the GMM model ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/22G10L15/183G10L15/16G10L15/06

CPCG10L15/22G10L15/183G10L15/16G10L15/063

Inventor 严勇杰邓科陈平王煊

Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Improved end-to-end speech recognition method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology