A multimodal speech emotion recognition method based on enhanced residual neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neural network and emotion recognition technology, applied in biological neural network models, neural learning methods, character and pattern recognition, etc., to achieve the effect of solving the problem of unequal input dimensions and reducing the voice dimension

Inactive Publication Date: 2019-03-12

SICHUAN UNIV

View PDF12 Cites 25 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since speech and image information are often high-dimensional data, traditional computing methods cannot perform feature learning well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] specific implementation plan

[0048] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0049] see figure 1 , a multimodal speech emotion recognition method based on enhanced deep residual neural network. The core model is a cross-enhanced deep residual neural network model, which can accept multiple modal data with different dimensions: speech , video, etc. At the same time, the basic structure of residual convolution can extract features from data, while the cross-type residual convolution structure and fusion function make multi-modal data fully fused, thus effectively improving the accuracy of emotion recognition .

[0050] see figure 2 , an overall data flow of a multimodal speech emotion recognition method based on an enhanced deep residual neural network, the specific steps are as follows:

[0051] (11) Audio preprocessing: The original speech signal is extracted from the spectrogram featur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multimodal speech emotion recognition method based on an enhanced depth residual neural network, which relates to the technical fields of video stream image processing, speech signal analysis and the like, and solves the emotion recognition problem of human-computer interaction. The invention mainly comprises the following steps: extracting the feature expression of video(sequence data) and speech, including converting the speech data into the corresponding speech spectrum expression, and encoding the time sequence data; Convolution neural network is used to extractthe emotional features of the original data for classification. The model accepts multiple inputs and the input dimensions are different. The cross-convolution layer is proposed to fuse the data features of different modes, and the overall network structure of the model is enhanced depth residual neural network. After the model is initialized, the multi-classification model is trained with speechspectrum, sequence video information and corresponding affective tagging. After the training, the unlabeled speech and video are predicted to obtain the probability value of affective prediction, andthe maximum value of probability is selected as the affective category of the multi-modal data. The invention improves the recognition accuracy on the problem of multi-modal emotion recognition.

Description

technical field [0001] A multi-modal speech emotion recognition method based on an enhanced deep residual neural network, which involves the technical fields of video stream image processing and speech signal analysis, and solves the problem of emotion recognition in human-computer interaction. Background technique [0002] With the rapid development of computer technology, human beings are increasingly dependent and demanding on computers. How to better realize the anthropomorphism of computers has become a research hotspot, and having "emotion" has become the research goal of the next generation of computers. Emotions can be conveyed through various means of communication, such as text, voice, video, etc. It is often impossible to understand emotions well through a single piece of information, so emotion recognition from multimodal data is currently the main field of vision for pattern recognition. [0003] Traditional multimodal research methods mainly rely on facial exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/084G06V40/20G06N3/045G06F18/253

Inventor 陈盈科毛华吴雨

Owner SICHUAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A multimodal speech emotion recognition method based on enhanced residual neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology