Improved end-to-end speech recognition method

A speech recognition and speech technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of complex construction process, time-consuming, inability to accurately represent speech signal distribution, etc., achieve excellent robustness, improve recognition rate and training efficiency effect

Active Publication Date: 2020-04-21
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional GMM-HMM-based speech recognition model has achieved good results, but because the GMM model is a shallow model that cannot accurately represent the distribution of speech signals, and the construction process of the HMM model is complex and requires alignment operations, etc., people proposed based on neural networks. An end-to-end speech recognition model is established, that is, the speech signal is directly mapped to the text sequence, without data mark alignment, pronunciation dictionary, etc., which simplifies the construction process and improves the recognition rate
[0003] At present, the end-to-end model is divided into CTC model and Seq-to-Seq model. The CTC model uses the deep neural network CNN or RNN to distribute the speech signal, which can accurately represent the feature distribution of the speech signal. Traditional speech recognition is for each One frame of data needs to know the corresponding label to carry out effective training. Before training, preprocessing of voice alignment is required, which requires repeated iterations, which is time-consuming.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved end-to-end speech recognition method
  • Improved end-to-end speech recognition method
  • Improved end-to-end speech recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention proposes an improved end-to-end speech recognition method, as attached figure 1 shown, including the following steps:

[0040] Step 1. Obtain the voice and its transcribed text data set, perform the feature extraction result of Mel Spectrum on the voice data as the input feature, and obtain the tag set and dictionary from the transcribed text.

[0041] Step 2. Build a model including a convolutional layer, a self-attention layer, and a fully connected layer. Use the CTC loss function as the loss function of the model, and use the backpropagation algorithm to update the model parameters.

[0042] Step 3. Using the trained model, the speech feature sequence is used as an input to obtain an output, and the output result is decoded to obtain a final result.

[0043] The following is a detailed description with reference to the illustrations.

[0044] Firstly, the speech feature data and its transcription text labels are obtained by using the speech a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an improved end-to-end speech recognition method, which combines a convolutional neural network and a self-attention mechanism and utilizes a CTC training criterion to train anend-to-end speech recognition model. The model is mainly composed of three parts: (1) a depth two-dimensional convolution part; (2) a self-attention part; (3) a full connection layer. The first part of the model effectively extracts features of a time axis and a frequency domain axis of a voice signal through two-dimensional convolution, and translation invariance is achieved; the second part enables the voice signal to be fully combined with context through the self-attention mechanism; the third part classifies features of each frame of voice; and finally model parameters are updated throughthe CTC training criterion. According to the model, the self-attention mechanism is innovatively added into a neural network-CTC framework, so that end-to-end speech recognition is realized, and therecognition effect is improved.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to an improved end-to-end speech recognition method. Background technique [0002] People's research on speech recognition technology began in the 1950s. The purpose was to receive human speech and let the machine understand human intentions. At first, people carried out simple recognition of isolated words and syllables. In the 1960s, speech recognition began. Systematic theory, after the emergence of computers, people switched from hardware to building simulation software for speech recognition. The algorithm of speech recognition has gone through the process of pattern matching algorithm represented by dynamic scaling algorithm, statistical model algorithm represented by hidden Markov, and now end-to-end speech recognition algorithm based on machine learning. The traditional GMM-HMM-based speech recognition model has achieved good results, but because the GMM model ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/22G10L15/183G10L15/16G10L15/06
CPCG10L15/22G10L15/183G10L15/16G10L15/063
Inventor 严勇杰邓科陈平王煊
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products