Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech emotion recognition method based on multi-scale deep convolution recurrent neural network

A technology of speech emotion recognition and cyclic neural network, which is applied in speech analysis, instruments, etc., and can solve problems such as ignoring discrimination

Active Publication Date: 2018-10-30
TAIZHOU UNIV
View PDF3 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing speech emotion recognition methods based on deep learning technology have ignored the different characteristics of different lengths of speech spectrum fragment information for the different discrimination of different emotion types (see literature: Mao Q, et al. Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 2014, 16(8): 2203-2213.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech emotion recognition method based on multi-scale deep convolution recurrent neural network
  • Speech emotion recognition method based on multi-scale deep convolution recurrent neural network
  • Speech emotion recognition method based on multi-scale deep convolution recurrent neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0046] figure 1 It is a flowchart of the present invention, mainly including:

[0047] Step 1: Generation of three-channel speech spectrum segments;

[0048] Step 2: Using a deep convolutional neural network (CNN) to extract features of speech spectrum segments at different scales;

[0049] Step 3: Use long-short-term memory network (LSTM) to realize time modeling of speech spectrum segment sequences at different scales, and output the emotion recognition result of the entire speech;

[0050] Step 4: Use the fractional layer fusion method to realize the fusion of the recognition results obtained by CNN+LSTM at different scales, and output the final speech emotion recognition results.

[0051] One, the realization of each step of the flow chart of the present invention is specifically expressed as follows in conjunction with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech emotion recognition method based on a multi-scale deep convolution recurrent neural network. The method comprises the steps that (1), three-channel speech spectrum segments are generated; (2), speech spectrum segment features under different scales are extracted by adopting the convolution neural network (CNN); (3), time modeling of a speech spectrum segment sequence under different scales is achieved by adopting a long short-term memory (LSTM), and emotion recognition results of a whole sentence of speech is output; (4), fusions of recognition results obtainedby CNN+LSTM under different scales are achieved by adopting a score level fusion method, and the final speech emotion recognition result is output. By means of the method, natural speech emotion recognition performance under actual environments can be effectively improved, and the method can be applied to the fields of artificial intelligence, robot technologies, natural human-computer interaction technologies and the like.

Description

technical field [0001] The invention relates to the fields of speech signal processing and pattern recognition, in particular to a speech emotion recognition method based on a multi-scale deep convolutional cyclic neural network. Background technique [0002] Human language not only contains rich text information, but also carries audio information that contains people's emotional expressions, such as changes in the pitch, strength, and cadence of voice. How to let the computer automatically recognize the emotional state of the speaker from the voice signal, that is, the so-called "speech emotion recognition" research, has become a hot research topic in the fields of artificial intelligence, pattern recognition, and emotional computing. The research aims to enable the computer to acquire, recognize and respond to the user's emotional information by analyzing the speaker's voice signal, so as to achieve a more harmonious and natural interaction between the user and the comput...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/63G10L25/30
CPCG10L25/30G10L25/63
Inventor 张石清赵小明
Owner TAIZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products