A multimodal speech emotion recognition method based on enhanced residual neural network

A neural network and emotion recognition technology, applied in biological neural network models, neural learning methods, character and pattern recognition, etc., to achieve the effect of solving the problem of unequal input dimensions and reducing the voice dimension

Inactive Publication Date: 2019-03-12
SICHUAN UNIV
View PDF12 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since speech and image information are often high-dimensional data, traditional computing methods cannot perform feature learning well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multimodal speech emotion recognition method based on enhanced residual neural network
  • A multimodal speech emotion recognition method based on enhanced residual neural network
  • A multimodal speech emotion recognition method based on enhanced residual neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] specific implementation plan

[0048] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0049] see figure 1 , a multimodal speech emotion recognition method based on enhanced deep residual neural network. The core model is a cross-enhanced deep residual neural network model, which can accept multiple modal data with different dimensions: speech , video, etc. At the same time, the basic structure of residual convolution can extract features from data, while the cross-type residual convolution structure and fusion function make multi-modal data fully fused, thus effectively improving the accuracy of emotion recognition .

[0050] see figure 2 , an overall data flow of a multimodal speech emotion recognition method based on an enhanced deep residual neural network, the specific steps are as follows:

[0051] (11) Audio preprocessing: The original speech signal is extracted from the spectrogram featur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multimodal speech emotion recognition method based on an enhanced depth residual neural network, which relates to the technical fields of video stream image processing, speech signal analysis and the like, and solves the emotion recognition problem of human-computer interaction. The invention mainly comprises the following steps: extracting the feature expression of video(sequence data) and speech, including converting the speech data into the corresponding speech spectrum expression, and encoding the time sequence data; Convolution neural network is used to extractthe emotional features of the original data for classification. The model accepts multiple inputs and the input dimensions are different. The cross-convolution layer is proposed to fuse the data features of different modes, and the overall network structure of the model is enhanced depth residual neural network. After the model is initialized, the multi-classification model is trained with speechspectrum, sequence video information and corresponding affective tagging. After the training, the unlabeled speech and video are predicted to obtain the probability value of affective prediction, andthe maximum value of probability is selected as the affective category of the multi-modal data. The invention improves the recognition accuracy on the problem of multi-modal emotion recognition.

Description

technical field [0001] A multi-modal speech emotion recognition method based on an enhanced deep residual neural network, which involves the technical fields of video stream image processing and speech signal analysis, and solves the problem of emotion recognition in human-computer interaction. Background technique [0002] With the rapid development of computer technology, human beings are increasingly dependent and demanding on computers. How to better realize the anthropomorphism of computers has become a research hotspot, and having "emotion" has become the research goal of the next generation of computers. Emotions can be conveyed through various means of communication, such as text, voice, video, etc. It is often impossible to understand emotions well through a single piece of information, so emotion recognition from multimodal data is currently the main field of vision for pattern recognition. [0003] Traditional multimodal research methods mainly rely on facial exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06V40/20G06N3/045G06F18/253
Inventor 陈盈科毛华吴雨
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products