Unlock instant, AI-driven research and patent intelligence for your innovation.
Multimodal Emotion Recognition Method
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An emotion recognition and multi-modal technology, applied in the field of data processing, can solve the problems of long-sequence context modeling and other problems
Active Publication Date: 2021-09-14
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF3 Cites 1 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
In addition, in addition to multimodal fusion, in terms of model architecture, current multimodal emotion recognition methods mainly use recurrent neural networks to capture temporal context information. Stretched
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0059] like figure 1 , The present application multimodal emotion recognition method provided in the embodiment, comprising:
[0060] S1: input test sample audio files, video files, and text files corresponding respectively to the audio files, video files, text files, and feature extraction to obtain the frame-level audio features, video features, and the frame-level word level text feature.
[0061] In some specific embodiments, each of the said audio files, video files, and text files feature extraction comprising:
[0062] The audio files get a short audio clip frame segmentation level; the short audio clip are respectively input to the audio feature extraction in advance trained network to obtain the frame-level audio features;
[0063] Using the tool face detection extracted from the video file level face image frame; the frame level face image are input to the pre-trained network feature extraction facial expression, to obtain the frame-level video feature;
[0064] Segmenta...
Embodiment 2
[0110] The present application also discloses an electronic apparatus, including computer program memory, a processor and a memory and running on the processor in the memory when the processor executes the computer program, the above method described in the Examples A step of.
Embodiment 3
[0112] Multimodal emotion recognition method comprising the steps of:
[0113] S1-1: measured audio input, video test and test text, video and audio of the test, the test text and video test three different modalities.
[0114] In this embodiment, audio test, the same test is a video segment of the video and audio, text, and audio test measured, corresponding test video, audio, video and text are three kinds of this video mode.
[0115] In this embodiment, the present embodiment need to analyze the data of these three modalities embodiment, to detect the input segment characters which emotional states.
[0116] According to the above embodiment, further, input can be a fragment of a speaking character, a character which is to be measured talking successive frames of video, the audio is to be tested in the fragment audio, the video and audio clips in text corresponding text is to be measured, for example, a segment of the said word, the speaker of this picture is the video test, th...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
This application relates to a multimodal emotion recognition method, including: extracting audio features at the frame level, video features at the frame level, and text features at the word level; respectively inputting the extracted features into a feature encoder for modeling, and obtaining the encoded Audio coding, video coding, and text coding features; the encoded features are firstly modeled through their own self-attention modules to model the interaction relationship within the modality, and then they are sorted and combined in pairs and input into the cross-modal attention The module models the interaction relationship between two modalities; the output of the above self-attention module and cross-modal attention module is time-series pooled to obtain the global interaction features in each modal, and the global interaction between two modalities. Interaction features: use the attention mechanism to weight and fuse the above-mentioned global interaction features within the modal and between the modals to obtain the feature representations of the entire sample to be tested within the modal and between the modals, and splicing the two through a fully connected network Get the final sentiment classification result.
Description
Technical field [0001] The present application relates to data processing, and more particularly relates to a multimodal emotion recognition. Background technique [0002] Traditional emotion recognition is often limited to a single mode, such as speech emotion recognition, emotion expression recognition and text analysis. With the development of computer science and technology, multimodal emotion recognition and text-based audio and video emerged, in future smart home, education, and financial fields will have a wide range of applications. Conventional multimodal emotion recognition process generally employed wherein a fusion layer or fusion decision-making information to integrate multiple modalities. These methods have their advantages and disadvantages. Feature Fusion although capable of modeling the interaction between modes but requires prior aligned on the timing characteristics of different modalities, the Decision Fusion contrary, it does not need modal alignment informa...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.