Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC

A CNN-DFSMN-CTC, speech recognition technology, applied in speech recognition, speech analysis, instruments and other directions, can solve the problem of not being applied speech recognition and so on

Inactive Publication Date: 2020-11-20
CHONGQING UNIV OF POSTS & TELECOMM
View PDF3 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Transformer, as a language model, is widely used in natural language processing, but has not been applied to speech recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC
  • Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC
  • Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0074] The technical scheme that the present invention solves the problems of the technologies described above is:

[0075] Such as figure 1 As shown, the present invention provides a kind of acoustic model based on CNN-DFSMN-CTC, Transformer is the speech recognition method of language model, it is characterized in that, comprises the following steps:

[0076] S1, the speech signal is preprocessed, combined with the low frame rate LFR, the speech signal is pre-emphasized first, and then analyzed through a fixed 10ms frame shift 25ms Hamming window, and 80 mel filter banks are used to extract 80-dimensional Take the logarithmic Mel filter (Filter banks, Fbank) feature;

[0077] S2, the extracted 80-dime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention requests to protect a Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC. The method comprises the steps of: S1, preprocessing a speech signal, and extracting 80-dimensional log mel Fbank features; S2, carrying out convolution on the extracted 80-dimensional Fbank features by using a CNN convolution network; S3, inputting the features into a DFSMN network structure; S4, taking CTC loss as a loss function of an acoustic model, predicting by adopting a Beam search algorithm, and optimizing by using an Adam optimizer; S5, introducing a strong language model Transformer for iterative training until an optimal model structure is achieved; and S6, combining the Transformer with the acoustic model CNN-DFSMN-CTC to carry out adaptation, and carrying out verification on multiple data sets to finally obtain an optimal identification result. According to the method, the recognition accuracy is higher, the decoding speed is higher, the character error rate reaches 11.8% after verification on a plurality of data sets, and the best character error rate reaches 7.8% on an Aidatang data set.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a Chinese speech recognition method combining Transformer and CNN-DFSMN-CTC. Background technique [0002] In the field of speech recognition development, researchers are committed to converting speech information into text information as completely and accurately as possible. The key to speech recognition lies in two parts, the acoustic model and the language model. Before the rise of deep learning and its application in the field of speech recognition, the acoustic model already had a very mature model system, and there were also cases where it was successfully applied to practical systems. For example, the classic Gaussian Mixed Model (Gaussian Mixed Model, GMM) and Hidden Markov Model (Hidden Markov Model, HMM). After the rise of neural networks and deep learning, acoustic models and language models based on deep learning such as recurrent neural networks (Recurrent Neural...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/183G10L15/02
CPCG10L15/16G10L15/183G10L15/02
Inventor 胡章芳蹇芳唐珊珊明子平姜博文
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products