Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

End-to-end speech recognition method based on constrained structured sparse attention mechanism, and storage medium

A speech recognition and attention technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as interfering with the decoder recognition process

Active Publication Date: 2021-09-10
HARBIN INST OF TECH
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention aims to solve the problem that there is a large amount of prediction irrelevant information in the decoding process of the existing speech recognition method based on the Softmax attention mechanism, which seriously interferes with the recognition process of the decoder.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end speech recognition method based on constrained structured sparse attention mechanism, and storage medium
  • End-to-end speech recognition method based on constrained structured sparse attention mechanism, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0053] This embodiment is an end-to-end speech recognition method based on a constraint-based structured sparse attention mechanism, such as figure 1 As shown, in the training phase, firstly, the original signal from the training set is sampled, quantized, frame-level feature extraction, high-level acoustic representation extraction, matching score calculation; then, through matching score sorting, matching score threshold calculation, attention Score normalization and glance vector generation are used to obtain glance vectors at each decoding moment; finally, the decoder is used for recognition and used to train the recognizer to obtain a speech recognition model. In the test phase, first, sample, quantize, and extract frame-level features for each original speech signal in the test set; then, use the trained speech recognition model to perform high-level acoustic representation extraction and matching score calculation on the feature matrix; next , through matching score sor...

Embodiment

[0077] In order to verify the effect of the present invention, the LibriSpeech data set is processed using the end-to-end speech recognition method based on the restricted structured sparse attention mechanism described in Embodiment 1, and with related methods (traditional softmax attention mechanism The processing method) is compared with the processing effect of the LibriSpeech data set, such as figure 2 The displayed accuracy rate comparison histogram is shown, where CER and WER represent the word error rate and word error rate respectively, and dev and test represent the processing accuracy rates of the development process and the test process, respectively. By comparing the accuracy of the method proposed in the present invention and the end-to-end speech recognition method based on the Softmax transformation function, it can be verified that the limited structured sparse attention mechanism can effectively reduce the word error rate and word error rate, and the effect i...

specific Embodiment approach 2

[0079] This embodiment is a storage medium, and at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement an end-to-end speech recognition method based on a constraint-based structured sparse attention mechanism.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an end-to-end speech recognition method based on a constrained structured sparse attention mechanism, and a storage medium, belonging to the technical field of speech recognition. The invention aims to solve the problem that a large amount of prediction-irrelevant information exists in the decoding process of an existing speech recognition method based on the Softmax attention mechanism, and consequently the recognition process of a decoder is seriously disturbed. According to the method, firstly, sampling, quantization, frame-level feature extraction, high-level acoustic representation extraction and matching score calculation are carried out on an original signal; then, through matching score sorting, matching score threshold calculation, attention score normalization and glance vector generation, the glance vector of each decoding moment is obtained; and finally, a decoder is used for recognition, and a recognizer is trained, so a speech recognition model is obtained. According to the invention, by generating uniform, continuous and sparse attention score vectors, the proportion of prediction-irrelevant information in the glance vectors is reduced, and the purpose of improving recognition performance is achieved. The method is mainly used for speech recognition.

Description

technical field [0001] The invention relates to an end-to-end voice recognition technology, belonging to the technical field of voice recognition. Background technique [0002] With the continuous development of deep learning, end-to-end speech recognition methods have been successfully applied in various practical fields such as mobile phones, tablet computers, and smart homes, and have attracted more and more attention from researchers. Among many end-to-end speech recognition technologies, the encoder-decoder model based on the attention mechanism has achieved the best performance because it takes into account the context of the input speech sequence and the recognition text sequence. The method uses an attention mechanism to learn the alignment relationship between the input speech sequence and the recognized text sequence to reduce the interference of prediction irrelevant information on the decoder's prediction process. However, because the Softmax transformation func...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G10L15/06G10L15/16G10L15/22
CPCG10L15/02G10L15/16G10L15/22G10L15/063
Inventor 韩纪庆薛嘉宾郑贵滨郑铁然
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products