A Speech Emotion Recognition Method Based on Multi-scale Deep Convolutional Recurrent Neural Network

A technology of speech emotion recognition and cyclic neural network, which is applied in speech analysis, instruments, etc., can solve problems such as ignoring discrimination, and achieve the effect of alleviating the lack of samples

Active Publication Date: 2022-03-08
TAIZHOU UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing speech emotion recognition methods based on deep learning technology have ignored the different characteristics of different lengths of speech spectrum fragment information for the different discrimination of different emotion types (see literature: Mao Q, et al. Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 2014, 16(8): 2203-2213.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Speech Emotion Recognition Method Based on Multi-scale Deep Convolutional Recurrent Neural Network
  • A Speech Emotion Recognition Method Based on Multi-scale Deep Convolutional Recurrent Neural Network
  • A Speech Emotion Recognition Method Based on Multi-scale Deep Convolutional Recurrent Neural Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0046] figure 1 It is a flowchart of the present invention, mainly comprising:

[0047] Step 1: Generation of three-channel speech spectrum segments;

[0048] Step 2: Using a deep convolutional neural network (CNN) to extract features of speech spectrum segments at different scales;

[0049] Step 3: Use long-short-term memory network (LSTM) to realize time modeling of speech spectrum segment sequences at different scales, and output the emotion recognition result of the entire speech;

[0050] Step 4: Use the fractional layer fusion method to realize the fusion of the recognition results obtained by CNN+LSTM at different scales, and output the final speech emotion recognition results.

[0051] One, the realization of each step of the flow chart of the present invention is specifically expressed as follows in conjunction wit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech emotion recognition method based on a multi-scale deep convolutional cyclic neural network. The implementation steps are: (1) generation of three-channel speech spectrum segments; (2) using deep convolutional neural network (CNN) to extract features of speech spectrum segments at different scales; (3) using long short-term memory network (LSTM) to realize different Time modeling of the sequence of speech spectrum fragments at different scales, and output the emotion recognition results of the entire speech; (4) Use the fractional layer fusion method to realize the fusion of the recognition results obtained by CNN+LSTM at different scales, and output the final speech emotion recognition results . The invention can effectively improve the performance of natural speech emotion recognition in the actual environment, and can be used in the fields of artificial intelligence, robot technology, natural human-computer interaction technology and the like.

Description

technical field [0001] The invention relates to the fields of speech signal processing and pattern recognition, in particular to a speech emotion recognition method based on a multi-scale deep convolutional cyclic neural network. Background technique [0002] Human language not only contains rich text information, but also carries audio information that contains people's emotional expressions, such as changes in the pitch, strength, and cadence of voice. How to let the computer automatically recognize the emotional state of the speaker from the voice signal, that is, the so-called "speech emotion recognition" research, has become a hot research topic in the fields of artificial intelligence, pattern recognition, and emotional computing. The research aims to enable the computer to acquire, recognize and respond to the user's emotional information by analyzing the speaker's voice signal, so as to achieve a more harmonious and natural interaction between the user and the comput...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/63G10L25/30
CPCG10L25/30G10L25/63
Inventor 张石清赵小明
Owner TAIZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products