Bimodal fusion emotion recognition method based on video and voice information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice information and emotion recognition technology, applied in the field of emotion recognition, can solve the problems of low accuracy of emotional features, unobjectivity, loss of emotional information, etc.

Pending Publication Date: 2021-07-23

CHANGCHUN UNIV OF SCI & TECH

View PDF4 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, due to the complexity and diversity of human emotional expressions, if one only thinks about a certain expression and judges human emotions, the final result will be one-sided and not objective, which will lead to many valuable emotions. information loss

[0004] With the in-depth development of artificial intelligence technology in the information age, the majority of people pay more attention to the research of emotional computing, but the human emotions are complex and changeable, and the accuracy of judging emotional characteristics by measuring one of the information alone is low. In order to improve the accuracy, The present invention is hereby proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055] A bimodal fusion emotion recognition method based on video and voice information, such as figure 1 described, including the following steps:

[0056] 1) Acquisition of voice signals and facial images: non-contact acquisition of natural voice and facial images using microphones and cameras;

[0057] The camera refers to a CMOS digital camera, the output electrical signal is directly amplified and converted into a digital signal;

[0058] The microphone refers to a digital MEMS microphone that outputs a 1 / 2 cycle pulse density modulated digital signal;

[0059] 2) Signal preprocessing: Preprocessing the signals of the video and voice modes, including the video signal and the voice signal, respectively, so that they meet the input requirements of the corresponding models of different modes;

[0060] 3) Emotion feature extraction: perform feature extraction on the face image signal and voice signal after step 2) preprocessing respectively, and obtain corresponding feature...

Embodiment 2

[0065] Video information processing flow, such as figure 2 described, including the following steps:

[0066] 1) obtain the video file to be processed; analyze the video file to obtain a video frame; filter the video frame based on the pixel information of the video frame, and use the video frame obtained after filtering as the image of the facial emotion to be recognized ;

[0067] 2) based on the pixel information of the video frame, generating a histogram corresponding to the video frame and simultaneously determining the definition of the video frame; according to the histogram and edge detection operator, clustering the video frame, Obtain at least one class; Filter video frames repeated in each class and video frames whose resolution is less than the resolution threshold;

[0068] 3) based on the filtered video frame, the method based on convolutional neural network is used to perform face detection, alignment, rotation and resizing operations on the video frame to ob...

Embodiment 3

[0072] Voice information processing flow, such as image 3 described, including the following steps:

[0073] 1) using a digital MEMS microphone to obtain a human body voice signal, and pre-emphasizing the human body voice signal through a first-order high-pass FIR digital filter, and outputting voice data after pre-emphasis;

[0074] 2) utilizing short-term analysis technology to carry out frame processing to the voice data after the pre-emphasis, and obtain the time series of voice feature parameters;

[0075] 3) Using the Hamming window function to perform windowing processing on the speech feature parameter time series to obtain speech windowing data

[0076] 4) utilize double-threshold comparison method to carry out endpoint detection to described voice window data, obtain the voice data after preprocessing;

[0077] 5) Carry out short-time Fourier transform to the speech data after the preprocessing, draw speech spectrogram;

[0078] 6) said spectrogram is input into ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a bimodal fusion emotion recognition method based on video information and voice information. Face and voice information is acquired through a camera and a microphone of external equipment. Face image and voice features are extracted from the video and voice information, normalization processing is carried out on face image feature vectors and voice feature vectors, the processed features are transmitted into a Bi-GRU network for training, and then the input features in the two single-mode sub-networks are used for calculating the weight of state information at each moment. The input features of the two single-mode sub-networks are fused to obtain a multi-mode joint feature vector, the joint feature vector is used as the input of a pre-trained deep neural network, the deep neural network contains an emotion classifier, different types of emotion evaluation information are obtained through the emotion classifier, and the emotion evaluation information is used as the input of the multi-mode joint feature vector. The emotion evaluation information of the user is more objective and has reference value, so that a more accurate emotion recognition result is obtained.

Description

technical field [0001] This application relates to the field of emotion recognition, in particular to a dual-modal fusion emotion recognition method based on video and voice information. Background technique [0002] In general, the way humans naturally communicate and express emotion is multimodal. This means we can express emotions verbally or visually. When more emotions are expressed with tones, audio data may contain the main clues for emotion recognition; when more face images are used to express emotions, it can be considered that most of the cues needed to mine emotions exist in face images Using multimodal information such as human facial expressions, speech intonation, and language content is an interesting and challenging problem. [0003] Affective computing under the traditional mode research direction focuses on a single mode. Such as the recognition of speech emotion, video action and face image. These traditional single-modal emotion recognition calculati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/08G10L25/63

CPCG06N3/08G10L25/63G06V40/176G06F18/24G06F18/25

Inventor 臧景峰史玉欢王鑫磊刘瑞

Owner CHANGCHUN UNIV OF SCI & TECH

Bimodal fusion emotion recognition method based on video and voice information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology