Video depth relation analysis method based on multi-modal feature fusion

A feature fusion and relationship analysis technology, applied in the field of computer vision, can solve problems such as inability to solve long video deep relationship analysis, difficulty in combining short video analysis, and inability to solve multiple entities, etc.

Pending Publication Date: 2021-01-05
NANJING UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1) The content of the short video is relatively small, often there is only one scene, and there are not many characters. The existing technology for short video analysis cannot solve multiple entities, including the relationship prediction between characters and scenes;
[0005] 2) It is difficult to merge the analysis of short videos, and it is impossible to predict the relations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video depth relation analysis method based on multi-modal feature fusion
  • Video depth relation analysis method based on multi-modal feature fusion
  • Video depth relation analysis method based on multi-modal feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention proposes a video depth relationship analysis method based on multimodal feature fusion, and establishes a multimodal feature fusion network for identifying entity relationship diagrams in videos, such as figure 1 As shown, the network input includes video, scene screenshots and scene names and character screenshots and character names, and the output is a relationship diagram between the corresponding scenes and characters. The output is used for the problem types of long video analysis. Figure to answer. The input of the multimodal feature fusion network is given by the dataset during the training phase and manually during the test usage phase. The realization of the multi-modal feature fusion network is as follows: first, the input video is divided into multiple segments according to the scene, vision and sound models, each segment is an act, and sound and text features are extracted in each act as the act features , and then identify the locati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a video depth relation analysis method based on multi-modal feature fusion is based on a visual, sound and character feature fusion network of video sub-screens, scenes and character recognition; and the method comprises the following steps: firstly, dividing an input video into a plurality of screens according to scene, visual and sound models, and extracting corresponding sound and character features on each screen; secondly, identifying positions appearing in each screen according to the input scene screenshots and figure screenshots, extracting corresponding entityvisual features from the scene and the figure, and calculating visual features of a joint area for every two entity pairs; and for each entity pair, connecting the screen features, the entity features and the entity pair features, predicting a relationship between each screen entity pair through small sample learning in combination with zero sample learning, and constructing an entity relationship graph on the whole video by combining the entity relationship on each screen of the video. According to the method, three types of deep video analysis questions including knowledge graph filling, question answering and entity relationship paths can be answered by utilizing the entity relationship graph.

Description

technical field [0001] The invention belongs to the technical field of computer vision, relates to entity relationship detection in video, and specifically relates to a video depth relationship analysis method based on multimodal feature fusion. Background technique [0002] In-depth relationship analysis between different entities on long videos is helpful for in-depth understanding of long videos, which often requires inferring hidden information from known information. In-depth relationship analysis on long videos is dedicated to constructing relationship graphs between two types of entities, scenes and characters. Through the entity relationship diagram, various questions about in-depth video analysis can be answered. [0003] Similar work on video understanding includes video induction, behavior recognition, visual relationship detection, and social relationship recognition, but these works are generally applicable to short videos, and lack in-depth analysis of relatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06T7/246G10L25/24G06F40/30G06F16/36G06F16/35
CPCG06F16/367G06T7/246G10L25/24G06F40/30G06F16/35G06T2207/30241G06V40/161G06V20/48G06V20/46G06V20/49G06F18/253
Inventor 任桐炜武港山于凡王丹丹张贝贝
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products