Unlock instant, AI-driven research and patent intelligence for your innovation.

Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network

A technology for speech recognition and speech recognition model, applied in the field of pattern recognition, can solve the problems of large number of model parameters and increase in computational complexity, and achieve the effect of small performance degradation

Active Publication Date: 2021-07-20
NORTHWESTERN POLYTECHNICAL UNIV +1
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Transformer also has some shortcomings, such as the computational complexity of dot product self-attention increases quadratically with the length of the input feature sequence, and the number of model parameters is large, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network
  • Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network
  • Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0080] 1, data preparation:

[0081] In an embodiment, the experimental data adopts public language Mandarin Corpus Aishell-1. The training set contains a voice of approximately 150 hours (120,098 statements) recorded by 340 speakers; the development set contains about 20 hours (14,326 statements) recorded by 40 speakers; test sets included by 20 The voice of approximately 10 hours (7,176 statements) recorded by the speaker.

[0082] 2, data processing:

[0083] Extract 80-dimensional Mel filter group characteristics, the frame length is 25ms, the frame is shifted to 10 ms, and the characteristics are normalized, so that each speaker is characterized by 0, and the variance is 1. In addition, select 4233 characters (including padding symbols " ", Unknown symbol * # * "And sentence end symbol * # * ") As a modeling unit.

[0084] 3, build the network:

[0085] The model and baseline model proposed by the present invention are based on the ESPNET toolkit, and the baseline model adopt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a lightweight end-to-end speech recognition method based on a convolutional self-attention transformation network, and the method comprises the steps: firstly constructing a lightweight end-to-end speech recognition model based on the convolutional self-attention transformation network, improving the convolutional self-attention transformation network through the model, and forming an efficient convolutional self-attention transformation network; applying the low-rank decomposition to a feed-forward layer in the convolutional self-attention transformation network to form a low-rank feed-forward module; providing a multi-head efficient self-attention MHESA, and adopting the MHESA to replace dot product self-attention in a convolutional self-attention transformation network encoder; and finally, obtaining a voice recognition model through training to recognize the voice. According to the method, the calculation complexity of the self-attention layer of the encoder is reduced to be linear, the parameter quantity of the whole model is reduced by about 50%, and the performance is basically unchanged.

Description

Technical field [0001] The present invention belongs to the field of pattern recognition, and more particularly to a lightweight end-to-end speech recognition method. Background technique [0002] Speech Identification (ASR, Automatic Speech Recognition) aims to convert voice signals into text content, which can image the "machine auditory system", is an important research area of ​​human-computer communication and interaction technology, and is also a key technology of artificial intelligence. one. Voice recognition can be applied to many aspects including voice assistants, automatic driving, smart home, handheld mobile devices. In recent years, end-to-end voice recognition technology has many advantages compared to traditional methods. If the labeling of training data is simple, the dependence of linguistics is small, and there is no need to hidden Markov chain in the hidden Markov chain. The conditions of the transfer probability are independent assumptions, and the training a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/26G10L15/06
CPCG10L15/063Y02T10/40
Inventor 张晓雷李盛强陈星
Owner NORTHWESTERN POLYTECHNICAL UNIV