A video emotion recognition method integrating facial expression recognition and voice emotion recognition

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A facial expression recognition, speech emotion recognition technology, applied in speech analysis, character and pattern recognition, acquisition/recognition of facial features, etc. There are few facial expression frames, ignoring the internal connection between facial features and voice features, etc.

Active Publication Date: 2019-03-01

HEBEI UNIV OF TECH

View PDF10 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The shortcomings of the existing decision-level fusion methods mainly include two points. First, the proportional scoring mechanism and weight allocation strategy lack unified and authoritative standards. Different researchers often use various proportional scoring mechanisms and different weight allocation strategies in the same research. Different recognition results were obtained in the project; second: the decision-level fusion method focuses on the fusion of face recognition and speech recognition results, ignoring the internal relationship between face features and speech features

[0007] CN106529504A discloses a dual-mode video emotion recognition method with composite spatio-temporal features, which expands the existing volume local binary mode algorithm into a spatio-temporal ternary mode, and obtains the spatio-temporal local ternary mode moment texture features of facial expression and upper body posture, Further integrate the three-dimensional gradient direction histogram feature to enhance the description of emotional video, and combine the two features into a composite spatio-temporal feature. This method will affect its algorithm when the upper body posture of the person in the video changes rapidly or the upper body posture picture is missing. Therefore, the dual-modal video emotion recognition method combined with facial expressions and upper body postures has certain limitations in feature extraction.

However, this algorithm has the disadvantages of high recognition rate and low usability for only three types of video emotion data classification

[0009] CN103400145B discloses a voice-visual fusion emotion recognition method based on clue neural network. The method first uses the characteristic data of three channels of people's front facial expression, side facial expression and voice to independently train a neural network to Perform the recognition of discrete emotional categories. During the training process, the output layer of the neural network model adds 4 clue nodes, which respectively carry the clue information of 4 coarse-grained categories in the activity-evaluation space, and then use the multimodal fusion model The output results of the three neural networks are fused, and the multimodal fusion model also uses the neural network trained based on clue information. However, in most videos, the number of frames of facial expressions on the side of the face is small, and it is difficult to effectively collect them. , causing the method to have great limitations in practical operation

This method only extracts the features of video key frames when extracting visual emotional features, and ignores the relationship between video frames and features to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0125] The video emotion recognition method of the fusion of facial expression recognition and voice emotion recognition in this embodiment is a two-process progressive audio-visual emotion recognition method based on decision-making level, and the specific steps are as follows:

[0126] Process A. Face image expression recognition as the first classification recognition:

[0127] This process A includes the extraction of facial expression features, the grouping of facial expressions and the first classification of facial expression recognition, and the steps are as follows:

[0128] In the first step, the video signal is subjected to video frame extraction and voice signal extraction:

[0129] Decompose the video in the database into a sequence of image frames, and use the open source FormatFactory software to extract video frames, extract the voice signal in the video and save it in MP3 format;

[0130] The second step is the preprocessing of image frame sequence and speech...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video emotion recognition method integrating facial expression recognition and voice emotion recognition. The invention relates to processing of a recording medium for identifying graphics. The method is a decision-level-based two-process progressive audio-visual emotion recognition method, the method separates face expression recognition from voice emotion recognition in a video, the two-process progressive emotion recognition method is adopted, and the technology of voice emotion recognition is carried out on the basis of face expression recognition by calculatingthe conditional probability; the method comprises the steps that A, facial image expression recognition serves as first-time classification recognition; B, voice emotion recognition serves as second-time classification recognition; and C, fusion of facial expression recognition and speech emotion recognition. According to the method, the defects that in the prior art, the internal relation betweenface features and voice features is ignored in human emotion recognition, the recognition speed of video emotion recognition is low, and the recognition rate is low are overcome.

Description

technical field [0001] The technical solution of the present invention relates to the processing of the record carrier used for recognizing graphics, in particular to a video emotion recognition method that integrates facial expression recognition and voice emotion recognition. Background technique [0002] With the rapid development of artificial intelligence and computer vision technology, human-computer interaction technology is changing with each passing day. Human emotion recognition technology using computers has received extensive attention. How to make computers recognize human emotions more quickly and accurately has become the current topic in the field of machine vision. Research hotspots. [0003] Human beings express their emotions in a variety of ways, mainly including facial expressions, voice emotions, upper body postures, and language texts. Among them, facial expressions and voice emotions are the two most typical ways of expressing emotions. Since the te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G10L25/63G10L25/57

CPCG10L25/57G10L25/63G06V40/172G06V40/168G06V40/174G06F18/25

Inventor 于明张冰郭迎春于洋师硕郝小可朱叶阎刚

Owner HEBEI UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A video emotion recognition method integrating facial expression recognition and voice emotion recognition

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology