Emotion recognition method based on multimode voice information complementation and gate control

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech information and emotion recognition technology, applied in character and pattern recognition, neural learning methods, biological neural network models, etc., can solve problems such as the proportion and balance of modal fusion representations that are rarely considered

Pending Publication Date: 2022-05-13

SHANGHAI UNIVERSITY OF INTERNATIONAL BUSINESS AND ECONOMICS +1

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Despite improvements in previous work, the issue of scale and balance in modality fusion representations is rarely considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0017] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0018] In the prior art, most speech emotion recognition models only consider the information of the speech modality but do not take the text, that is, its semantic information into account, and lack the balanced fusion of semantic information and audio information; and most current networks are often due to large Due to the impact of the scale pre-training model, the amount of parameters is huge, and it is difficult to implement it in some scenarios that require high real-time and lightweight.

[0019] The emotion recognition method based on multi-mode speech information complementary AND gate control provided by the present invention, such as figure 1 As shown, first, the audio features and text features in the target video are extracted. For the text mode, the pre-trained GloVe word embedding...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an emotion recognition method based on multimode voice information complementation and gate control, and belongs to the technical field of multimode emotion recognition, and the method comprises the following steps: S1, extracting audio features and text features in a target video; s2, performing feature bidirectional fusion on the audio features and the text features; s3, adjusting the proportion of fusion representation in the bidirectional fusion result in the S2 through a learnable gate control mechanism, and outputting the proportion; and S4, splicing the output of the learnable gate control mechanism in the S3, and finally obtaining emotion category output. According to the invention, the gating mechanism is applied to the cross attention module to determine whether to reserve the source modal information or cover the target modal information, and the proportion of the source modal information to the target modal information is adjusted, so that the recognition accuracy and the parameter quantity of the model are balanced.

Description

technical field [0001] The invention relates to the technical field of multimodal emotion recognition, in particular to an emotion recognition method based on multimodal speech information complementary AND gate control. Background technique [0002] Emotion plays a key role in interpersonal communication, not only verbal information but also sound information conveys an individual's emotional state. In many fields, such as human-computer interaction, healthcare, and cognitive science, great emphasis has been placed on developing tools to recognize emotion in human vocal expressions. The recent vigorous development of deep learning has also promoted the development of emotion recognition, and the needs of applications have promoted the development of high-performance lightweight models. [0003] Many existing works improve the performance of speech emotion recognition based on pure audio features. Representations based on LLDs are extracted by deep learning networks, such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V20/40G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/047G06N3/044G06N3/045G06F18/253

Inventor 刘峰李知函齐佳音周爱民李志斌

Owner SHANGHAI UNIVERSITY OF INTERNATIONAL BUSINESS AND ECONOMICS

Emotion recognition method based on multimode voice information complementation and gate control

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology