Speaker emotion perception method fusing multi-dimensional information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-dimensional information and emotion perception technology, applied in the field of deep learning and human emotion perception, can solve the problems of lack of speaker image information, inability to eliminate ambiguity, limited improvement effect, etc., to achieve elimination of ambiguity, good economic benefits, The effect of enhancing the fusion ability

Pending Publication Date: 2021-12-24

XIAMEN UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

No matter which method is used, in complex and changeable interactive scenarios in reality, there are problems such as insufficient precision and inability to eliminate ambiguity.

There are also some methods that combine the semantic information of the language text and the audio signal to improve the effect of emotional perception to a certain extent, but the lack of image information of the speaker always limits the improvement effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] The present invention will be further explained below in conjunction with specific embodiments.

[0021] refer to Figure 1~3 , this embodiment proposes a speaker emotion perception method that fuses multi-dimensional information, including the following steps:

[0022] S1: Input the video of the speaker, and extract the image and voice of the speaker from the video;

[0023] S2: Input the speaker's image and voice into the multi-dimensional feature extraction network, and analyze the language content feature in the voice text and language emotion feature audio Extract and extract the speaker's facial expression feature feature from the image information face ;

[0024] S3: Use the multi-dimensional feature encoding algorithm to encode the various feature results of the multi-dimensional feature extraction network, and map the multi-dimensional information to a shared encoding space Shared-Space(feature text ,fea ure audio ,feature face );

[0025] S4: Using a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaker emotion perception method fusing multi-dimensional information, and relates to the technical field of deep learning and human emotion perception. The method includes: inputting a video of a speaker, and extracting an image and voice of the speaker from the video; inputting the image and the voice of the speaker into a multi-dimensional feature extraction network, extracting a language text and a language emotion in the voice, and extracting facial expression features of the speaker from image information; utilizing a multi-dimensional feature coding algorithm for coding various feature results of the multi-dimensional feature extraction network, and mapping the multi-dimensional information to a shared coding space; fusing the features in the coding space from low dimension to high dimension by using a multi-dimensional feature fusion algorithm, and obtaining feature vectors of multi-dimensional information highly related to the emotion of the speaker in a high-dimensional feature space; and inputting the fused multi-dimensional information into an emotion perception network for prediction, and outputting emotion perception distribution of the speaker. By means of the method, the ambiguity can be effectively eliminated according to the multi-dimensional information, and the emotion perception distribution of the speaker can be accurately predicted.

Description

technical field [0001] The invention relates to the technical field of deep learning and human emotion perception, in particular to a speaker emotion perception method that integrates multi-dimensional information. Background technique [0002] The traditional deep learning algorithm only estimates the emotion of the language content information, and the language content itself is ambiguous, and needs to be combined with the intonation information when expressing the content, lacking the association and constraints between the language content and the voice emotional information, or only relying on Simple image information detection of faces for emotion estimation lacks adaptability to language content and voice emotions, and cannot be used in complex and changeable human-computer interaction scenarios in real situations, and its practical value is limited. [0003] Traditional emotion perception estimation methods based on deep learning can be divided into three parts: (1) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06F16/75G06F16/783G06N3/04G06N3/08

CPCG06N3/08G06F16/75G06F16/7834G06F16/784G06F16/7847G06N3/045

Inventor 曾鸣丁艺伟邓文晋刘鹏飞

Owner XIAMEN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speaker emotion perception method fusing multi-dimensional information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology