Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A CNN-DFSMN-CTC, speech recognition technology, applied in speech recognition, speech analysis, instruments and other directions, can solve the problem of not being applied speech recognition and so on

Inactive Publication Date: 2020-11-20

CHONGQING UNIV OF POSTS & TELECOMM

View PDF3 Cites 25 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, Transformer, as a language model, is widely used in natural language processing, but has not been applied to speech recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0073] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0074] The technical scheme that the present invention solves the problems of the technologies described above is:

[0075] Such as figure 1 As shown, the present invention provides a kind of acoustic model based on CNN-DFSMN-CTC, Transformer is the speech recognition method of language model, it is characterized in that, comprises the following steps:

[0076] S1, the speech signal is preprocessed, combined with the low frame rate LFR, the speech signal is pre-emphasized first, and then analyzed through a fixed 10ms frame shift 25ms Hamming window, and 80 mel filter banks are used to extract 80-dimensional Take the logarithmic Mel filter (Filter banks, Fbank) feature;

[0077] S2, the extracted 80-dime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention requests to protect a Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC. The method comprises the steps of: S1, preprocessing a speech signal, and extracting 80-dimensional log mel Fbank features; S2, carrying out convolution on the extracted 80-dimensional Fbank features by using a CNN convolution network; S3, inputting the features into a DFSMN network structure; S4, taking CTC loss as a loss function of an acoustic model, predicting by adopting a Beam search algorithm, and optimizing by using an Adam optimizer; S5, introducing a strong language model Transformer for iterative training until an optimal model structure is achieved; and S6, combining the Transformer with the acoustic model CNN-DFSMN-CTC to carry out adaptation, and carrying out verification on multiple data sets to finally obtain an optimal identification result. According to the method, the recognition accuracy is higher, the decoding speed is higher, the character error rate reaches 11.8% after verification on a plurality of data sets, and the best character error rate reaches 7.8% on an Aidatang data set.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC. Background technique [0002] In the field of speech recognition development, researchers are committed to converting speech information into text information as completely and accurately as possible. The key to speech recognition lies in two parts, the acoustic model and the language model. Before the rise of deep learning and its application in the field of speech recognition, the acoustic model already had a very mature model system, and there were also cases where it was successfully applied to practical systems. For example, the classic Gaussian Mixed Model (Gaussian Mixed Model, GMM) and Hidden Markov Model (Hidden Markov Model, HMM). After the rise of neural networks and deep learning, acoustic models and language models based on deep learning such as recurrent neural networks (Recurrent Neural...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/16G10L15/183G10L15/02

CPCG10L15/16G10L15/183G10L15/02

Inventor 胡章芳蹇芳唐珊珊明子平姜博文

Owner CHONGQING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology