Speaker emotion perception method fusing multi-dimensional information

A multi-dimensional information and emotion perception technology, applied in the field of deep learning and human emotion perception, can solve the problems of lack of speaker image information, inability to eliminate ambiguity, limited improvement effect, etc., to achieve elimination of ambiguity, good economic benefits, The effect of enhancing the fusion ability

Pending Publication Date: 2021-12-24
XIAMEN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

No matter which method is used, in complex and changeable interactive scenarios in reality, there are problems such as insufficient precision and inability to eliminate ambiguity.
There are also some methods that combine

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker emotion perception method fusing multi-dimensional information
  • Speaker emotion perception method fusing multi-dimensional information
  • Speaker emotion perception method fusing multi-dimensional information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further explained below in conjunction with specific embodiments.

[0021] refer to Figure 1~3 , this embodiment proposes a speaker emotion perception method that fuses multi-dimensional information, including the following steps:

[0022] S1: Input the video of the speaker, and extract the image and voice of the speaker from the video;

[0023] S2: Input the speaker's image and voice into the multi-dimensional feature extraction network, and analyze the language content feature in the voice text and language emotion feature audio Extract and extract the speaker's facial expression feature feature from the image information face ;

[0024] S3: Use the multi-dimensional feature encoding algorithm to encode the various feature results of the multi-dimensional feature extraction network, and map the multi-dimensional information to a shared encoding space Shared-Space(feature text ,fea ure audio ,feature face );

[0025] S4: Using a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaker emotion perception method fusing multi-dimensional information, and relates to the technical field of deep learning and human emotion perception. The method includes: inputting a video of a speaker, and extracting an image and voice of the speaker from the video; inputting the image and the voice of the speaker into a multi-dimensional feature extraction network, extracting a language text and a language emotion in the voice, and extracting facial expression features of the speaker from image information; utilizing a multi-dimensional feature coding algorithm for coding various feature results of the multi-dimensional feature extraction network, and mapping the multi-dimensional information to a shared coding space; fusing the features in the coding space from low dimension to high dimension by using a multi-dimensional feature fusion algorithm, and obtaining feature vectors of multi-dimensional information highly related to the emotion of the speaker in a high-dimensional feature space; and inputting the fused multi-dimensional information into an emotion perception network for prediction, and outputting emotion perception distribution of the speaker. By means of the method, the ambiguity can be effectively eliminated according to the multi-dimensional information, and the emotion perception distribution of the speaker can be accurately predicted.

Description

technical field [0001] The invention relates to the technical field of deep learning and human emotion perception, in particular to a speaker emotion perception method that integrates multi-dimensional information. Background technique [0002] The traditional deep learning algorithm only estimates the emotion of the language content information, and the language content itself is ambiguous, and needs to be combined with the intonation information when expressing the content, lacking the association and constraints between the language content and the voice emotional information, or only relying on Simple image information detection of faces for emotion estimation lacks adaptability to language content and voice emotions, and cannot be used in complex and changeable human-computer interaction scenarios in real situations, and its practical value is limited. [0003] Traditional emotion perception estimation methods based on deep learning can be divided into three parts: (1) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06F16/75G06F16/783G06N3/04G06N3/08
CPCG06N3/08G06F16/75G06F16/7834G06F16/784G06F16/7847G06N3/045
Inventor 曾鸣丁艺伟邓文晋刘鹏飞
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products