Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-modal emotion classification method based on text, voice and video fusion

A technology of emotional classification and video fusion, applied in character and pattern recognition, other database clustering/classification, instruments, etc., can solve the problems of unstable accuracy and high cost, achieve good flexibility, improve accuracy, and be easy to implement Effect

Pending Publication Date: 2019-09-27
NANJING UNIV OF SCI & TECH
View PDF3 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Before the rise of machine learning methods, sentiment analysis was mainly done manually, with high cost and unstable accuracy
Traditional machine learning and traditional multimodal methods mainly rely on the idea of ​​feature engineering, using artificially extracted features on the voice and video sides. However, due to the ambiguity of emotional expression, artificially extracted features are often difficult to extract deep emotional expressions. , there is still a lot of room for improvement in the accuracy of emotion recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal emotion classification method based on text, voice and video fusion
  • Multi-modal emotion classification method based on text, voice and video fusion
  • Multi-modal emotion classification method based on text, voice and video fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0055] Such as Figure 4 As shown, this embodiment takes the MOSI data set of Carnegie Mellon University as an example, first obtains the original data of three modes, and then performs preprocessing.

[0056] Mark the emotional label of the corresponding segment, and align the corresponding video subtitle data (text mode), same-frequency audio data (audio mode), and video data (video mode). for example:

[0057] Ordinary sample: "I love this movie." From the semantics, the emotional category can be directly marked as positive;

[0058] Semantically ambiguous samples: "The movie is sick." Combined with a loud voice and obvious frowns in the video, the emotional category can be marked as negative;

[0059] In the training phase, the original samples of are sent to the multimodal emotion classification model based on tensor fusion for training, and the emotion classification model is obtained, which is used to judge the emotion category of the test sample during the test; In...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-modal emotion classification method based on text, voice and video fusion, and the method comprises the steps of obtaining the multi-modal data, preprocessing the multi-modal data, and dividing the multi-modal data into a training set and a test set; constructing an end-to-end multi-modal emotion classification model based on tensor fusion, and training the model by using the training set; and carrying out the preprocessing operation in the step 1 on the test set, and carrying out the sentiment classification by using the tensor fusion sentiment classification model obtained in the step 2. According to the present invention, the fuzzy deep emotion information can be better captured through the multi-modal emotion classification model.

Description

technical field [0001] The invention belongs to natural language processing technology, specifically a multimodal emotion classification method based on fusion of text, voice and video. Background technique [0002] At present, relevant social media sites are producing a large amount of video data with rich emotional information every day, resulting in a large number of multimodal opinion mining and sentiment analysis technologies for text, voice, and video. This technology is not only a natural Academic frontier issues and hot research issues in the field of language processing and sentiment analysis are also important issues that need to be solved urgently in the field of application. They have immeasurable application value and social significance, but also pose great challenges. [0003] Before the rise of machine learning methods, sentiment analysis was mainly done manually, which was costly and unstable. Traditional machine learning and traditional multimodal methods ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/906G06K9/62
CPCG06F16/906G06F18/241
Inventor 夏睿李晟华
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products