End-to-end speech recognition method and system and storage medium

A speech recognition and speech feature technology, applied in speech recognition, speech analysis, neural learning methods, etc., can solve the problems of incomparable recognition effect, deviation of model to data fitting, poor noise robustness, etc.

Pending Publication Date: 2022-06-07
PURPLE MOUNTAIN LAB
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the case of limited data (the amount of training data is less than 100 hours), the model is fully trained due to insufficient data, the model is overfitting, the generalization is poor, and the robustness to noise is poor, and its recognition effect is not as good as that of Traditional Speech Recognit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end speech recognition method and system and storage medium
  • End-to-end speech recognition method and system and storage medium
  • End-to-end speech recognition method and system and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] refer to figure 2 , this embodiment provides an end-to-end speech recognition method, the method includes the following steps:

[0034] S1: Train the initial speech feature extraction model;

[0035] Specifically, in this embodiment, pre-training is performed on a source corpus (a corpus with sufficient data, usually referring to the amount of training data > 100 hours), and the initial weight value of the model is obtained. The source corpus can choose a public corpus, such as THCHS-30, Aishell, ST-CMDS;

[0036] The deep CNN can enhance the representation of the frequency domain features of the speech signal. Therefore, the VGGNet model architecture can be used for the extraction of speech features. In this embodiment, the VGG16 model is used to extract the frequency domain features of the speech signal. Refer to image 3 , is a schematic diagram of the structure of the VGG16 model. The VGG16 model consists of 13 convolutional layers and 3 fully connected layers. A...

Embodiment 2

[0062] This embodiment provides an end-to-end speech recognition system, refer to Figure 5 , the system includes the following modules:

[0063] The model training module 101 is used to train an initial speech feature extraction model using the source corpus based on the VGGNet model, remove the fully connected layer in the initial speech feature extraction model and freeze a preset number of convolutional layer parameters, and use the target corpus The initial speech feature extraction model after removal and freezing is trained to obtain a frequency domain feature extraction network; it should be noted here that, due to the specific model training process, steps S1-S2 of the end-to-end speech recognition method in Embodiment 1 are performed. It has been elaborated in detail, so it will not be repeated here.

[0064] The framework building module 102 is used for constructing an end-to-end speech recognition framework. After the frequency domain feature extraction network, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an end-to-end speech recognition method and system. The method comprises the following steps: training an initial speech feature extraction model by using a source corpus based on a VGGNet model; removing a full connection layer in the initial speech feature extraction model and freezing a preset number of convolutional layer parameters, and training the removed and frozen initial speech feature extraction model by using a target corpus to obtain a frequency domain feature extraction network; constructing an end-to-end speech recognition framework, wherein the framework comprises an encoder and a decoder; and training the end-to-end speech recognition framework by using a target corpus, and performing end-to-end speech recognition based on the trained end-to-end speech recognition framework. The method can effectively solve the problem of model overfitting under the condition of limited data, improves the accuracy of speech recognition, and has good noise robustness.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to an end-to-end speech recognition method, system and storage medium. Background technique [0002] In the past few decades, speech recognition technology has made great progress driven by artificial intelligence technology. End-to-end speech recognition based on deep learning can directly complete speech-to-text transcription through a single model, and its performance exceeds Compared with traditional speech recognition methods, it reaches the state-of-the-art in some tasks. Therefore, end-to-end speech recognition has received extensive attention from academia and industry. [0003] In traditional speech recognition, the most common algorithm is Hidden Markov Model (GMM-HMM) based on Gaussian Mixture Model. With the rise of deep neural network (DNN), the DNN-HMM framework uses the DNN model to replace the original GMM model to model each state without making assum...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/02G10L15/06G10L19/16G06N3/04G06N3/08
CPCG10L15/02G10L15/063G10L19/16G06N3/08G10L2015/0631G06N3/044G06N3/045
Inventor 王丹陶高峰邢凯陈力孙仕康黄超侯晓晖孙羽朱静夏丹丹罗永璨
Owner PURPLE MOUNTAIN LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products