Video emotion recognition method integrating facial expression recognition and speech emotion recognition

A technology of facial expression recognition and speech emotion recognition, which is applied in speech analysis, character and pattern recognition, and acquisition/recognition of facial features. It can solve problems such as difficult collection, low usability, and different recognition results.

Active Publication Date: 2020-12-01
HEBEI UNIV OF TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The shortcomings of the existing decision-level fusion methods mainly include two points. First, the proportional scoring mechanism and weight allocation strategy lack unified and authoritative standards. Different researchers often use various proportional scoring mechanisms and different weight allocation strategies in the same research. Different recognition results were obtained in the project; second: the decision-level fusion method focuses on the fusion of face recognition and speech recognition results, ignoring the internal relationship between face features and speech features
[0007] CN106529504A discloses a dual-mode video emotion recognition method with composite spatio-temporal features, which expands the existing volume local binary mode algorithm into a spatio-temporal ternary mode, and obtains the spatio-temporal local ternary mode moment texture features of facial expression and upper body posture, Further integrate the three-dimensional gradient direction histogram feature to enhance the description of emotional video, and combine the two features into a composite spatio-temporal feature. This method will affect its algorithm when the upper body posture of the person in the video changes rapidly or the upper body posture picture is missing. Therefore, the dual-modal video emotion recognition method combined with facial expressions and upper body postures has certain limitations in feature extraction.
However, this algorithm has the disadvantages of high recognition rate and low usability for only three types of video emotion data classification
[0009] CN103400145B discloses a voice-visual fusion emotion recognition method based on clue neural network. The method first uses the characteristic data of three channels of people's front facial expression, side facial expression and voice to independently train a neural network to Perform the recognition of discrete emotional categories. During the training process, the output layer of the neural network model adds 4 clue nodes, which respectively carry the clue information of 4 coarse-grained categories in the activity-evaluation space, and then use the multimodal fusion model The output results of the three neural networks are fused, and the multimodal fusion model also uses the neural network trained based on clue information. However, in most videos, the number of frames of facial expressions on the side of the face is small, and it is difficult to effectively collect them. , causing the method to have great limitations in practical operation
This method only extracts the features of video key frames when extracting visual emotional features, and ignores the relationship between video frames and features to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video emotion recognition method integrating facial expression recognition and speech emotion recognition
  • Video emotion recognition method integrating facial expression recognition and speech emotion recognition
  • Video emotion recognition method integrating facial expression recognition and speech emotion recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0125] The video emotion recognition method that integrates facial expression recognition and speech emotion recognition in this embodiment is a two-process progressive audio-visual emotion recognition method based on decision level, and the specific steps are as follows:

[0126] Process A. Use facial image expression recognition as the first classification recognition:

[0127] The process A includes the extraction of facial expression features, the grouping of facial expressions and the first classification of facial expression recognition. The steps are as follows:

[0128] The first step is to extract the video frame and voice signal from the video signal:

[0129] The video in the database is decomposed into image frame sequences, and the open source FormatFactory software is used for video frame extraction, and the voice signal in the video is extracted and saved as MP3 format;

[0130] The second step is the preprocessing of image frame sequence and speech signal:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention is a video emotion recognition method that integrates facial expression recognition and speech emotion recognition, relates to the processing of recording carriers used for recognizing graphics, and is a two-process progressive audio-visual emotion recognition method based on decision-making levels. The method separates facial expression recognition and voice emotion recognition in video, adopts two process progressive emotion recognition methods, and performs voice emotion recognition technology on the basis of facial expression recognition by calculating conditional probability; The steps are: process A. face image expression recognition as the first classification recognition; process B. voice emotion recognition as the second classification recognition; process C. fusion of face expression recognition and voice emotion recognition. The invention overcomes the defects that the prior art ignores the internal connection between human face features and voice features in human emotion recognition, and the video emotion recognition has slow recognition speed and low recognition rate.

Description

technical field [0001] The technical solution of the present invention relates to the processing of a record carrier for recognizing graphics, and specifically relates to a video emotion recognition method that integrates facial expression recognition and speech emotion recognition. Background technique [0002] With the rapid development of artificial intelligence and computer vision technology, and the rapid development of human-computer interaction technology, human emotion recognition technology using computers has received extensive attention. How to make computers recognize human emotions more quickly and accurately has become the current field of machine vision. Research hotspots. [0003] There are various ways of expressing human emotions, mainly facial expressions, voice emotions, upper body gestures, and language texts. Among them, facial expressions and speech emotions are the two most typical ways of expressing emotions. Since the texture and geometric feature...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/62G10L25/63G10L25/57
CPCG10L25/57G10L25/63G06V40/172G06V40/168G06V40/174G06F18/25
Inventor 于明张冰郭迎春于洋师硕郝小可朱叶阎刚
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products