Speech emotion recognition method based on long short time memory network and convolutional neural network

A convolutional neural network, speech emotion recognition technology, applied in speech recognition, neural learning methods, biological neural network models, etc., can solve the problems of speech sequence processing, high feature dimension, single, etc. Improve accuracy and robustness, avoid the effects of complex processes

Active Publication Date: 2017-05-31
NANJING UNIV OF POSTS & TELECOMM
View PDF5 Cites 128 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among the above three methods, the main disadvantage is that the advantages of each network model cannot be taken into account
For example, the deep belief network can use a one-dimensional sequence as input, but it cannot take advantage of the correlation between the sequence; although the long-short-term memory network can use the correlation between the sequence, but the extracted feature dimension is high; t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech emotion recognition method based on long short time memory network and convolutional neural network
  • Speech emotion recognition method based on long short time memory network and convolutional neural network
  • Speech emotion recognition method based on long short time memory network and convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0045] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

[0046] Such as figure 1 It is a flowchart of the voice emotion recognition method based on LSTM and CNN of the present invention. The realization of the voice emotion recognition method based on LSTM and CNN of the present invention mainly includes the following steps:

[0047] Step 1: Choose a suitable voice emotion database and collect voice fragments therein;

[0048] In the actual operation process, select the AFEW database, which provides original video clips, which are all cut from movie works. Compared with commonly used laboratory databases, the voice and emotional expressions in the AFEW database are closer to real life environments and are more general. The sample age is scattered between 1 and 70 years old, covering all age groups, which c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech emotion recognition method based on a long short time memory network and a convolutional neural network. According to the method, a speech emotion recognition system based on LSTM (long short time memory network) and CNN (convolutional neural network) is constructed; with a speech sequence as input of the system, the LSTM and the CNN are trained by virtue of a back propagation algorithm and parameters of the networks are optimized, so that an optimized network model is obtained; and by virtue of the network model which is trained, the newly inputted speech sequence is emotionally classified into six emotions, namely being sad, being happy, being disgusted, being fearful, being scared and being neutral. The method, which takes two network models, namely the LSTM and the CNN, into comprehensive consideration, avoids cumbersome artificial selection and characteristic extraction, and the accuracy of emotion recognition is enhanced.

Description

technical field [0001] The invention relates to the fields of image processing and pattern recognition, in particular to a speech emotion recognition method based on long and short-term memory networks and convolutional neural networks. Background technique [0002] In interpersonal communication, there are many ways of information exchange including voice, body language, facial expression, etc. Among them, voice signal is the fastest and most primitive way of communication, and is considered by researchers to be one of the most effective ways to realize human-computer interaction. For nearly half a century, scholars have studied a large number of topics about speech recognition, that is, how to convert speech sequences into text. Despite significant progress in speech recognition, there is still a long way to go in achieving natural human-machine interactions due to the inability of machines to understand the emotional state of the speaker. This has also led to another as...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L25/30G10L25/27G10L25/63G10L15/16G06N3/04G06N3/08
CPCG06N3/049G06N3/084G10L15/16G10L25/27G10L25/30G10L25/63G06N3/045
Inventor 袁亮卢官明闫静杰
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products