Speech recognition method based on convolution neural network

A convolutional neural network and speech recognition technology, applied in the field of speech recognition based on convolutional neural network, can solve the problems of long training time of acoustic model, complex modeling process, limited application, etc. The effect of simple modeling process and easy training

Active Publication Date: 2019-01-25
JIANGNAN UNIV
View PDF2 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problems of long training time, complex modeling process and limited application of existing acoustic models, the present invention provides a speech recognition method based on convolutional neural network, which is better at extracting high-level features, and the modeling process is simple and easy to train , The generalization performance of the model is better, and it can be more widely applied to various speech recognition scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition method based on convolution neural network
  • Speech recognition method based on convolution neural network
  • Speech recognition method based on convolution neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] Such as Figure 1~Figure 5 As shown, the technical solution of the present invention is based on DCNN (Deep Convolutional NeuralNetwork) network model and CTC (Connectionist Temporal Classification, connectionist temporal classifier) ​​method to realize the acoustic model of end-to-end mode; comprising the following steps:

[0050] S1: Input the original voice, preprocess the original voice signal, and perform related transformation processing;

[0051] S2: Extract key feature parameters reflecting the features of the speech signal to form a sequence of feature vectors;

[0052] S3: Build an acoustic model; use the DCNN network model as the basis and use the connectionist time classifier CTC as the loss function to build an end-to-end acoustic model;

[0053] The structure of the acoustic model includes multiple convolutional layers, two fully connected layers, and CTC loss function set in sequence; the structure of multiple convolutional layers is: the first layer and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a speech recognition method based on a convolution neural network, which is more good at extracting high-level features, has simple modeling process, is easy to train, has better generalization performance of the model, and can be more widely applied to various speech recognition scenes. The method comprises the following steps: S1, preprocessing the input original speech signal; S2, extracting the key feature parameters reflecting the characteristics of the speech signal to form a feature vector sequence;S 3, base on that DCNN network model, taking the connected time classifier CTC as a loss function, constructing an acoustic model of an end-to-end mode; S4, training the acoustic model to obtain the trained acoustic model; S5, inputting the feature vector sequence to be recognized obtained in the step S2 into the trained acoustic model to obtain a recognition result; and S6, performing a subsequent operation on the basis of the recognition result obtained in step S5, that is, obtaining a word string capable of outputting the speech signal with a maximum probability, that is, a language character after the original speech is recognized.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a speech recognition method based on a convolutional neural network. Background technique [0002] In speech recognition technology, the GMM-HMM (Gaussian Mixed Model-Hiddden Markov Model) model has always occupied a dominant position as the acoustic model of speech, but due to the characteristics of the GMM-HMM model itself, the GMM-HMM acoustic model is First of all, the alignment operation needs to be performed, and the data of each frame needs to be aligned with the corresponding label. The alignment process is cumbersome and complicated, resulting in a long training time, and because the model is a combination model of GMM and HMM, the specific modeling process is relatively simple. It is complex and has certain limitations in the specific application of speech recognition technology. Contents of the invention [0003] In order to solve the problems of long tra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/16
CPCG10L15/063G10L15/16G10L2015/0631
Inventor 曹毅张威翟明浩刘晨黄子龙李巍
Owner JIANGNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products