Video emotion recognition method integrating facial expression recognition and speech emotion recognition

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of facial expression recognition and speech emotion recognition, which is applied in speech analysis, character and pattern recognition, and acquisition/recognition of facial features. It can solve problems such as difficult collection, low usability, and different recognition results.

Active Publication Date: 2020-12-01

HEBEI UNIV OF TECH

View PDF10 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The shortcomings of the existing decision-level fusion methods mainly include two points. First, the proportional scoring mechanism and weight allocation strategy lack unified and authoritative standards. Different researchers often use various proportional scoring mechanisms and different weight allocation strategies in the same research. Different recognition results were obtained in the project; second: the decision-level fusion method focuses on the fusion of face recognition and speech recognition results, ignoring the internal relationship between face features and speech features

[0007] CN106529504A discloses a dual-mode video emotion recognition method with composite spatio-temporal features, which expands the existing volume local binary mode algorithm into a spatio-temporal ternary mode, and obtains the spatio-temporal local ternary mode moment texture features of facial expression and upper body posture, Further integrate the three-dimensional gradient direction histogram feature to enhance the description of emotional video, and combine the two features into a composite spatio-temporal feature. This method will affect its algorithm when the upper body posture of the person in the video changes rapidly or the upper body posture picture is missing. Therefore, the dual-modal video emotion recognition method combined with facial expressions and upper body postures has certain limitations in feature extraction.

However, this algorithm has the disadvantages of high recognition rate and low usability for only three types of video emotion data classification

[0009] CN103400145B discloses a voice-visual fusion emotion recognition method based on clue neural network. The method first uses the characteristic data of three channels of people's front facial expression, side facial expression and voice to independently train a neural network to Perform the recognition of discrete emotional categories. During the training process, the output layer of the neural network model adds 4 clue nodes, which respectively carry the clue information of 4 coarse-grained categories in the activity-evaluation space, and then use the multimodal fusion model The output results of the three neural networks are fused, and the multimodal fusion model also uses the neural network trained based on clue information. However, in most videos, the number of frames of facial expressions on the side of the face is small, and it is difficult to effectively collect them. , causing the method to have great limitations in practical operation

This method only extracts the features of video key frames when extracting visual emotional features, and ignores the relationship between video frames and features to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0125] The video emotion recognition method that integrates facial expression recognition and speech emotion recognition in this embodiment is a two-process progressive audio-visual emotion recognition method based on decision level, and the specific steps are as follows:

[0126] Process A. Use facial image expression recognition as the first classification recognition:

[0127] The process A includes the extraction of facial expression features, the grouping of facial expressions and the first classification of facial expression recognition. The steps are as follows:

[0128] The first step is to extract the video frame and voice signal from the video signal:

[0129] The video in the database is decomposed into image frame sequences, and the open source FormatFactory software is used for video frame extraction, and the voice signal in the video is extracted and saved as MP3 format;

[0130] The second step is the preprocessing of image frame sequence and speech signal:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention is a video emotion recognition method that integrates facial expression recognition and speech emotion recognition, relates to the processing of recording carriers used for recognizing graphics, and is a two-process progressive audio-visual emotion recognition method based on decision-making levels. The method separates facial expression recognition and voice emotion recognition in video, adopts two process progressive emotion recognition methods, and performs voice emotion recognition technology on the basis of facial expression recognition by calculating conditional probability; The steps are: process A. face image expression recognition as the first classification recognition; process B. voice emotion recognition as the second classification recognition; process C. fusion of face expression recognition and voice emotion recognition. The invention overcomes the defects that the prior art ignores the internal connection between human face features and voice features in human emotion recognition, and the video emotion recognition has slow recognition speed and low recognition rate.

Description

technical field [0001] The technical solution of the present invention relates to the processing of a record carrier for recognizing graphics, and specifically relates to a video emotion recognition method that integrates facial expression recognition and speech emotion recognition. Background technique [0002] With the rapid development of artificial intelligence and computer vision technology, and the rapid development of human-computer interaction technology, human emotion recognition technology using computers has received extensive attention. How to make computers recognize human emotions more quickly and accurately has become the current field of machine vision. Research hotspots. [0003] There are various ways of expressing human emotions, mainly facial expressions, voice emotions, upper body gestures, and language texts. Among them, facial expressions and speech emotions are the two most typical ways of expressing emotions. Since the texture and geometric feature...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06K9/00G06K9/62G10L25/63G10L25/57

CPCG10L25/57G10L25/63G06V40/172G06V40/168G06V40/174G06F18/25

Inventor 于明张冰郭迎春于洋师硕郝小可朱叶阎刚

Owner HEBEI UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video emotion recognition method integrating facial expression recognition and speech emotion recognition

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology