Unlock instant, AI-driven research and patent intelligence for your innovation.

Emotion recognition method based on multimode voice information complementation and gate control

A speech information and emotion recognition technology, applied in character and pattern recognition, neural learning methods, biological neural network models, etc., can solve problems such as the proportion and balance of modal fusion representations that are rarely considered

Pending Publication Date: 2022-05-13
SHANGHAI UNIVERSITY OF INTERNATIONAL BUSINESS AND ECONOMICS +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Despite improvements in previous work, the issue of scale and balance in modality fusion representations is rarely considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Emotion recognition method based on multimode voice information complementation and gate control
  • Emotion recognition method based on multimode voice information complementation and gate control
  • Emotion recognition method based on multimode voice information complementation and gate control

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0018] In the prior art, most speech emotion recognition models only consider the information of the speech modality but do not take the text, that is, its semantic information into account, and lack the balanced fusion of semantic information and audio information; and most current networks are often due to large Due to the impact of the scale pre-training model, the amount of parameters is huge, and it is difficult to implement it in some scenarios that require high real-time and lightweight.

[0019] The emotion recognition method based on multi-mode speech information complementary AND gate control provided by the present invention, such as figure 1 As shown, first, the audio features and text features in the target video are extracted. For the text mode, the pre-trained GloVe word embedding...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an emotion recognition method based on multimode voice information complementation and gate control, and belongs to the technical field of multimode emotion recognition, and the method comprises the following steps: S1, extracting audio features and text features in a target video; s2, performing feature bidirectional fusion on the audio features and the text features; s3, adjusting the proportion of fusion representation in the bidirectional fusion result in the S2 through a learnable gate control mechanism, and outputting the proportion; and S4, splicing the output of the learnable gate control mechanism in the S3, and finally obtaining emotion category output. According to the invention, the gating mechanism is applied to the cross attention module to determine whether to reserve the source modal information or cover the target modal information, and the proportion of the source modal information to the target modal information is adjusted, so that the recognition accuracy and the parameter quantity of the model are balanced.

Description

technical field [0001] The invention relates to the technical field of multimodal emotion recognition, in particular to an emotion recognition method based on multimodal speech information complementary AND gate control. Background technique [0002] Emotion plays a key role in interpersonal communication, not only verbal information but also sound information conveys an individual's emotional state. In many fields, such as human-computer interaction, healthcare, and cognitive science, great emphasis has been placed on developing tools to recognize emotion in human vocal expressions. The recent vigorous development of deep learning has also promoted the development of emotion recognition, and the needs of applications have promoted the development of high-performance lightweight models. [0003] Many existing works improve the performance of speech emotion recognition based on pure audio features. Representations based on LLDs are extracted by deep learning networks, such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06V20/40G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/047G06N3/044G06N3/045G06F18/253
Inventor 刘峰李知函齐佳音周爱民李志斌
Owner SHANGHAI UNIVERSITY OF INTERNATIONAL BUSINESS AND ECONOMICS