End-to-end speech recognition method and system and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech recognition and speech feature technology, applied in speech recognition, speech analysis, neural learning methods, etc., can solve the problems of incomparable recognition effect, deviation of model to data fitting, poor noise robustness, etc.

Pending Publication Date: 2022-06-07

PURPLE MOUNTAIN LAB

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in the case of limited data (the amount of training data is less than 100 hours), the model is fully trained due to insufficient data, the model is overfitting, the generalization is poor, and the robustness to noise is poor, and its recognition effect is not as good as that of Traditional Speech Recognition Technology

Although the time-frequency information of speech signals can be better mined by introducing more and more complex network computing layers, more parameters are introduced at the same time, and the knowledge learned by the model is limited when the data is limited. cause the model to fit the data bias

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0033] refer to figure 2 , this embodiment provides an end-to-end speech recognition method, the method includes the following steps:

[0034] S1: Train the initial speech feature extraction model;

[0035] Specifically, in this embodiment, pre-training is performed on a source corpus (a corpus with sufficient data, usually referring to the amount of training data > 100 hours), and the initial weight value of the model is obtained. The source corpus can choose a public corpus, such as THCHS-30, Aishell, ST-CMDS;

[0036] The deep CNN can enhance the representation of the frequency domain features of the speech signal. Therefore, the VGGNet model architecture can be used for the extraction of speech features. In this embodiment, the VGG16 model is used to extract the frequency domain features of the speech signal. Refer to image 3 , is a schematic diagram of the structure of the VGG16 model. The VGG16 model consists of 13 convolutional layers and 3 fully connected layers. A...

Embodiment 2

[0062] This embodiment provides an end-to-end speech recognition system, refer to Figure 5 , the system includes the following modules:

[0063] The model training module 101 is used to train an initial speech feature extraction model using the source corpus based on the VGGNet model, remove the fully connected layer in the initial speech feature extraction model and freeze a preset number of convolutional layer parameters, and use the target corpus The initial speech feature extraction model after removal and freezing is trained to obtain a frequency domain feature extraction network; it should be noted here that, due to the specific model training process, steps S1-S2 of the end-to-end speech recognition method in Embodiment 1 are performed. It has been elaborated in detail, so it will not be repeated here.

[0064] The framework building module 102 is used for constructing an end-to-end speech recognition framework. After the frequency domain feature extraction network, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an end-to-end speech recognition method and system. The method comprises the following steps: training an initial speech feature extraction model by using a source corpus based on a VGGNet model; removing a full connection layer in the initial speech feature extraction model and freezing a preset number of convolutional layer parameters, and training the removed and frozen initial speech feature extraction model by using a target corpus to obtain a frequency domain feature extraction network; constructing an end-to-end speech recognition framework, wherein the framework comprises an encoder and a decoder; and training the end-to-end speech recognition framework by using a target corpus, and performing end-to-end speech recognition based on the trained end-to-end speech recognition framework. The method can effectively solve the problem of model overfitting under the condition of limited data, improves the accuracy of speech recognition, and has good noise robustness.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to an end-to-end speech recognition method, system and storage medium. Background technique [0002] In the past few decades, speech recognition technology has made great progress driven by artificial intelligence technology. End-to-end speech recognition based on deep learning can directly complete speech-to-text transcription through a single model, and its performance exceeds Compared with traditional speech recognition methods, it reaches the state-of-the-art in some tasks. Therefore, end-to-end speech recognition has received extensive attention from academia and industry. [0003] In traditional speech recognition, the most common algorithm is Hidden Markov Model (GMM-HMM) based on Gaussian Mixture Model. With the rise of deep neural network (DNN), the DNN-HMM framework uses the DNN model to replace the original GMM model to model each state without making assum...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02G10L15/06G10L19/16G06N3/04G06N3/08

CPCG10L15/02G10L15/063G10L19/16G06N3/08G10L2015/0631G06N3/044G06N3/045

Inventor 王丹陶高峰邢凯陈力孙仕康黄超侯晓晖孙羽朱静夏丹丹罗永璨

Owner PURPLE MOUNTAIN LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

End-to-end speech recognition method and system and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology