Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for speech recognition and speech recognition model, applied in the field of pattern recognition, can solve the problems of large number of model parameters and increase in computational complexity, and achieve the effect of small performance degradation

Active Publication Date: 2021-07-20

NORTHWESTERN POLYTECHNICAL UNIV +1

View PDF6 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, Transformer also has some shortcomings, such as the computational complexity of dot product self-attention increases quadratically with the length of the input feature sequence, and the number of model parameters is large, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0080] 1, data preparation:

[0081] In an embodiment, the experimental data adopts public language Mandarin Corpus Aishell-1. The training set contains a voice of approximately 150 hours (120,098 statements) recorded by 340 speakers; the development set contains about 20 hours (14,326 statements) recorded by 40 speakers; test sets included by 20 The voice of approximately 10 hours (7,176 statements) recorded by the speaker.

[0082] 2, data processing:

[0083] Extract 80-dimensional Mel filter group characteristics, the frame length is 25ms, the frame is shifted to 10 ms, and the characteristics are normalized, so that each speaker is characterized by 0, and the variance is 1. In addition, select 4233 characters (including padding symbols " ", Unknown symbol * # * "And sentence end symbol * # * ") As a modeling unit.

[0084] 3, build the network:

[0085] The model and baseline model proposed by the present invention are based on the ESPNET toolkit, and the baseline model adopt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a lightweight end-to-end speech recognition method based on a convolutional self-attention transformation network, and the method comprises the steps: firstly constructing a lightweight end-to-end speech recognition model based on the convolutional self-attention transformation network, improving the convolutional self-attention transformation network through the model, and forming an efficient convolutional self-attention transformation network; applying the low-rank decomposition to a feed-forward layer in the convolutional self-attention transformation network to form a low-rank feed-forward module; providing a multi-head efficient self-attention MHESA, and adopting the MHESA to replace dot product self-attention in a convolutional self-attention transformation network encoder; and finally, obtaining a voice recognition model through training to recognize the voice. According to the method, the calculation complexity of the self-attention layer of the encoder is reduced to be linear, the parameter quantity of the whole model is reduced by about 50%, and the performance is basically unchanged.

Description

Technical field [0001] The present invention belongs to the field of pattern recognition, and more particularly to a lightweight end-to-end speech recognition method. Background technique [0002] Speech Identification (ASR, Automatic Speech Recognition) aims to convert voice signals into text content, which can image the "machine auditory system", is an important research area of human-computer communication and interaction technology, and is also a key technology of artificial intelligence. one. Voice recognition can be applied to many aspects including voice assistants, automatic driving, smart home, handheld mobile devices. In recent years, end-to-end voice recognition technology has many advantages compared to traditional methods. If the labeling of training data is simple, the dependence of linguistics is small, and there is no need to hidden Markov chain in the hidden Markov chain. The conditions of the transfer probability are independent assumptions, and the training a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/26G10L15/06

CPCG10L15/063Y02T10/40

Inventor 张晓雷李盛强陈星

Owner NORTHWESTERN POLYTECHNICAL UNIV

Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology