Multimodal Emotion Recognition Method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An emotion recognition and multi-modal technology, applied in the field of data processing, can solve the problems of long-sequence context modeling and other problems

Active Publication Date: 2021-09-14

INST OF AUTOMATION CHINESE ACAD OF SCI

View PDF3 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In addition, in addition to multimodal fusion, in terms of model architecture, current multimodal emotion recognition methods mainly use recurrent neural networks to capture temporal context information. Stretched

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0059] like figure 1 , The present application multimodal emotion recognition method provided in the embodiment, comprising:

[0060] S1: input test sample audio files, video files, and text files corresponding respectively to the audio files, video files, text files, and feature extraction to obtain the frame-level audio features, video features, and the frame-level word level text feature.

[0061] In some specific embodiments, each of the said audio files, video files, and text files feature extraction comprising:

[0062] The audio files get a short audio clip frame segmentation level; the short audio clip are respectively input to the audio feature extraction in advance trained network to obtain the frame-level audio features;

[0063] Using the tool face detection extracted from the video file level face image frame; the frame level face image are input to the pre-trained network feature extraction facial expression, to obtain the frame-level video feature;

[0064] Segmenta...

Embodiment 2

[0110] The present application also discloses an electronic apparatus, including computer program memory, a processor and a memory and running on the processor in the memory when the processor executes the computer program, the above method described in the Examples A step of.

Embodiment 3

[0112] Multimodal emotion recognition method comprising the steps of:

[0113] S1-1: measured audio input, video test and test text, video and audio of the test, the test text and video test three different modalities.

[0114] In this embodiment, audio test, the same test is a video segment of the video and audio, text, and audio test measured, corresponding test video, audio, video and text are three kinds of this video mode.

[0115] In this embodiment, the present embodiment need to analyze the data of these three modalities embodiment, to detect the input segment characters which emotional states.

[0116] According to the above embodiment, further, input can be a fragment of a speaking character, a character which is to be measured talking successive frames of video, the audio is to be tested in the fragment audio, the video and audio clips in text corresponding text is to be measured, for example, a segment of the said word, the speaker of this picture is the video test, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

This application relates to a multimodal emotion recognition method, including: extracting audio features at the frame level, video features at the frame level, and text features at the word level; respectively inputting the extracted features into a feature encoder for modeling, and obtaining the encoded Audio coding, video coding, and text coding features; the encoded features are firstly modeled through their own self-attention modules to model the interaction relationship within the modality, and then they are sorted and combined in pairs and input into the cross-modal attention The module models the interaction relationship between two modalities; the output of the above self-attention module and cross-modal attention module is time-series pooled to obtain the global interaction features in each modal, and the global interaction between two modalities. Interaction features: use the attention mechanism to weight and fuse the above-mentioned global interaction features within the modal and between the modals to obtain the feature representations of the entire sample to be tested within the modal and between the modals, and splicing the two through a fully connected network Get the final sentiment classification result.

Description

Technical field [0001] The present application relates to data processing, and more particularly relates to a multimodal emotion recognition. Background technique [0002] Traditional emotion recognition is often limited to a single mode, such as speech emotion recognition, emotion expression recognition and text analysis. With the development of computer science and technology, multimodal emotion recognition and text-based audio and video emerged, in future smart home, education, and financial fields will have a wide range of applications. Conventional multimodal emotion recognition process generally employed wherein a fusion layer or fusion decision-making information to integrate multiple modalities. These methods have their advantages and disadvantages. Feature Fusion although capable of modeling the interaction between modes but requires prior aligned on the timing characteristics of different modalities, the Decision Fusion contrary, it does not need modal alignment informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/906G06K9/62G06N3/04G06N3/08

CPCG06F16/906G06N3/049G06N3/08G06N3/048G06N3/045G06F18/241G06F18/25

Inventor 陶建华孙立才刘斌柳雪飞

Owner INST OF AUTOMATION CHINESE ACAD OF SCI

Multimodal Emotion Recognition Method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology