Multi-modal emotion recognition method and device, electronic equipment and storage medium

An emotion recognition, multimodal technology, applied in the field of speech recognition and image processing, can solve the problems of text noise, inaccurate emotion recognition, loss of irony, etc., to achieve high accuracy, reduce noise interference, improve accuracy and robustness. awesome effect

Pending Publication Date: 2021-02-26
SHANGHAI YUANMENG INTELLIGENT TECH CO LTD +1
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Most of the existing emotion recognition methods are based on text. Often in virtual human interaction, the text is based on the result of speech recognition. Since the accuracy of speech recognition is not necessarily high, the obtained text has a certain amount of noise.
[0004] At the same time, only when a human being says an ironic sentence is converted into text through speech recognition, all the irony is lost, resulting in inaccurate emotion recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal emotion recognition method and device, electronic equipment and storage medium
  • Multi-modal emotion recognition method and device, electronic equipment and storage medium
  • Multi-modal emotion recognition method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0083] An embodiment of the present invention, such as figure 1 As shown, a multi-modal emotion recognition method, including:

[0084] S100 Deduplicate the video data of the object to be identified, and acquire time-series image data of the face of the object to be identified.

[0085] Specifically, the data input layer in the training phase: the data input includes video data and text dialogue data collected in real time during the chat process. The video data captures the chat object in real time through the virtual human camera, and then recognizes the timing diagram of the face through the algorithm.

[0086] The video data of the object to be identified is deduplicated, and the time-series image data of the face of the object to be identified is obtained, which specifically includes the steps of:

[0087] Use the Vibe algorithm to model the background, extract the binary gray-scale contour map of the relatively static background, and perform the corresponding morpholog...

Embodiment 2

[0101] Based on the above-mentioned embodiment, the parts that are the same as those in the above-mentioned embodiment in this embodiment will not be described one by one, such as figure 2 As shown, this implementation provides a multimodal emotion recognition method, specifically including:

[0102] S100 Deduplicate the video data of the object to be identified, and acquire time-series image data of the face of the object to be identified.

[0103] S201 Acquire voice data input by the object to be recognized in each round of dialogue.

[0104] S202 Translate the voice data into text data in real time through a voice recognition interface.

[0105] Exemplary, the text data input layer of the multimodal emotion recognition model:

[0106] 1. During the chat process, obtain every sentence entered by the user, and wait for the user to finish a round of dialogue in order to obtain a complete single-round dialogue, because a single-round dialogue may contain multiple sentences. ...

Embodiment 3

[0112] Based on the above embodiment, in this embodiment, the aligned face time-series image data and the text data are input to a multimodal emotion recognition model to perform multimodal emotion recognition of the object to be recognized , including steps:

[0113]Using the multi-modal emotion recognition model to extract the first dual-modal feature with image as the core and the second dual-modal feature with text as the core, the first dual-modal feature and the second dual-modal feature State features are combined to obtain target features, and the target features are input to the softmax classifier of the multi-modal emotion recognition for classification and loss calculation, so as to obtain the multi-modal emotions of the object to be recognized.

[0114] Preferably, said using said multi-modal emotion recognition model to extract the first dual-modal features with image as the core, specifically includes the steps of:

[0115] The image semantic sequence vector in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of speech recognition and image processing, and provides a multi-modal emotion recognition method and device, electronic equipment and a storage medium, and the method comprises the steps: carrying out the duplicate removal of video data of a to-be-recognized object, and obtaining the face time sequence image data of the to-be-recognized object; when the video data of the to-be-identified object is acquired, acquiring text data of the to-be-identified object in real time; and inputting the aligned face time sequence image data and text data into a multi-modal emotion recognition model so as to carry out multi-modal emotion recognition on the to-be-recognized object. According to the method and device, the expression of the user and the text content of the dialogue in the dialogue process of the virtual human and the user are obtained in real time, and multi-dimensional rich features are obtained in an image and text signal combined input mode, so that the accuracy and robustness of emotion classification and detection are improved. Especially, the method and device have high accuracy in satiric, ironic and other scenes.

Description

technical field [0001] The invention relates to the technical fields of voice recognition and image processing, in particular to a multimodal emotion recognition method, device, electronic equipment and storage medium. Background technique [0002] In the point-to-point chat between virtual people and humans, it is necessary to recognize human emotions in real time, generate corresponding answers based on the emotion recognition results, and guide the multi-dimensional output of voice, text, and actions. Therefore, emotion recognition is very important to improve the emotional companionship experience of virtual people. [0003] Most of the existing emotion recognition methods are based on text. Often in virtual human interaction, the text is based on the result of speech recognition. Since the accuracy of speech recognition is not necessarily high, the obtained text has a certain amount of noise. [0004] At the same time, only when a human being says an ironic sentence is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06F40/30G06K9/62G10L15/26G10L25/63
CPCG06F40/30G10L25/63G10L15/26G06V40/174G06F18/2411
Inventor 曾祥云顾文元张雪源
Owner SHANGHAI YUANMENG INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products