Lightweight end-to-end speech recognition method based on convolutional self-attention transformation network
A technology for speech recognition and speech recognition model, applied in the field of pattern recognition, can solve the problems of large number of model parameters and increase in computational complexity, and achieve the effect of small performance degradation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment
[0080] 1, data preparation:
[0081] In an embodiment, the experimental data adopts public language Mandarin Corpus Aishell-1. The training set contains a voice of approximately 150 hours (120,098 statements) recorded by 340 speakers; the development set contains about 20 hours (14,326 statements) recorded by 40 speakers; test sets included by 20 The voice of approximately 10 hours (7,176 statements) recorded by the speaker.
[0082] 2, data processing:
[0083] Extract 80-dimensional Mel filter group characteristics, the frame length is 25ms, the frame is shifted to 10 ms, and the characteristics are normalized, so that each speaker is characterized by 0, and the variance is 1. In addition, select 4233 characters (including padding symbols " ", Unknown symbol * # * "And sentence end symbol * # * ") As a modeling unit.
[0084] 3, build the network:
[0085] The model and baseline model proposed by the present invention are based on the ESPNET toolkit, and the baseline model adopt...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


