Speech emotion recognition method based on long short time memory network and convolutional neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network, speech emotion recognition technology, applied in speech recognition, neural learning methods, biological neural network models, etc., can solve the problems of speech sequence processing, high feature dimension, single, etc. Improve accuracy and robustness, avoid the effects of complex processes

Active Publication Date: 2017-05-31

NANJING UNIV OF POSTS & TELECOMM

View PDF5 Cites 128 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Among the above three methods, the main disadvantage is that the advantages of each network model cannot be taken into account

For example, the deep belief network can use a one-dimensional sequence as input, but it cannot take advantage of the correlation between the sequence; although the long-short-term memory network can use the correlation between the sequence, but the extracted feature dimension is high; the convolutional neural network The network cannot directly process the speech sequence. It needs to perform Fourier transform on the speech signal first, convert it into a spectrum and use it as input

The traditional speech emotion recognition method has little development prospects in feature extraction and classification, and the existing deep learning-based speech emotion method network is relatively single

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0045] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

[0046] Such as figure 1 Be the flowchart of the speech emotion recognition method based on LSTM and CNN of the present invention, the realization of the speech emotion recognition method based on LSTM and CNN of the present invention mainly comprises the following steps:

[0047] Step 1: Select a suitable speech emotion database and collect the speech clips therein;

[0048] In the practical process, choose the AFEW database, which provides original video clips, which are all cut from film works. Compared with commonly used laboratory databases, the speech and emotional expressions in the AFEW database are closer to the real-life environment and more general. The age of the samples is scattered between 1 and 70 years old, covering various age grou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech emotion recognition method based on a long short time memory network and a convolutional neural network. According to the method, a speech emotion recognition system based on LSTM (long short time memory network) and CNN (convolutional neural network) is constructed; with a speech sequence as input of the system, the LSTM and the CNN are trained by virtue of a back propagation algorithm and parameters of the networks are optimized, so that an optimized network model is obtained; and by virtue of the network model which is trained, the newly inputted speech sequence is emotionally classified into six emotions, namely being sad, being happy, being disgusted, being fearful, being scared and being neutral. The method, which takes two network models, namely the LSTM and the CNN, into comprehensive consideration, avoids cumbersome artificial selection and characteristic extraction, and the accuracy of emotion recognition is enhanced.

Description

technical field [0001] The invention relates to the fields of image processing and pattern recognition, in particular to a speech emotion recognition method based on long and short-term memory networks and convolutional neural networks. Background technique [0002] In interpersonal communication, there are many ways of information exchange including voice, body language, facial expression, etc. Among them, voice signal is the fastest and most primitive way of communication, and is considered by researchers to be one of the most effective ways to realize human-computer interaction. For nearly half a century, scholars have studied a large number of topics about speech recognition, that is, how to convert speech sequences into text. Despite significant progress in speech recognition, there is still a long way to go in achieving natural human-machine interactions due to the inability of machines to understand the emotional state of the speaker. This has also led to another as...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/30G10L25/27G10L25/63G10L15/16G06N3/04G06N3/08

CPCG06N3/049G06N3/084G10L15/16G10L25/27G10L25/30G10L25/63G06N3/045

Inventor 袁亮卢官明闫静杰

Owner NANJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speech emotion recognition method based on long short time memory network and convolutional neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology