Video depth relation analysis method based on multi-modal feature fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature fusion and relationship analysis technology, applied in the field of computer vision, can solve problems such as inability to solve long video deep relationship analysis, difficulty in combining short video analysis, and inability to solve multiple entities, etc.

Pending Publication Date: 2021-01-05

NANJING UNIV

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] 1) The content of the short video is relatively small, often there is only one scene, and there are not many characters. The existing technology for short video analysis cannot solve multiple entities, including the relationship prediction between characters and scenes;

[0005] 2) It is difficult to merge the analysis of short videos, and it is impossible to predict the relationship between entities that are not in the same frame

[0006] At the same time, most of the existing analysis methods are applicable to the situation where there are enough training samples, while the deep relationship analysis task of long video has some relationships without training samples.

Therefore, the existing technology cannot solve the deep relationship analysis on the long video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] The present invention proposes a video depth relationship analysis method based on multimodal feature fusion, and establishes a multimodal feature fusion network for identifying entity relationship diagrams in videos, such as figure 1 As shown, the network input includes video, scene screenshots and scene names and character screenshots and character names, and the output is a relationship diagram between the corresponding scenes and characters. The output is used for the problem types of long video analysis. Figure to answer. The input of the multimodal feature fusion network is given by the dataset during the training phase and manually during the test usage phase. The realization of the multi-modal feature fusion network is as follows: first, the input video is divided into multiple segments according to the scene, vision and sound models, each segment is an act, and sound and text features are extracted in each act as the act features , and then identify the locati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video depth relation analysis method based on multi-modal feature fusion is based on a visual, sound and character feature fusion network of video sub-screens, scenes and character recognition; and the method comprises the following steps: firstly, dividing an input video into a plurality of screens according to scene, visual and sound models, and extracting corresponding sound and character features on each screen; secondly, identifying positions appearing in each screen according to the input scene screenshots and figure screenshots, extracting corresponding entityvisual features from the scene and the figure, and calculating visual features of a joint area for every two entity pairs; and for each entity pair, connecting the screen features, the entity features and the entity pair features, predicting a relationship between each screen entity pair through small sample learning in combination with zero sample learning, and constructing an entity relationship graph on the whole video by combining the entity relationship on each screen of the video. According to the method, three types of deep video analysis questions including knowledge graph filling, question answering and entity relationship paths can be answered by utilizing the entity relationship graph.

Description

technical field [0001] The invention belongs to the technical field of computer vision, relates to entity relationship detection in video, and specifically relates to a video depth relationship analysis method based on multimodal feature fusion. Background technique [0002] In-depth relationship analysis between different entities on long videos is helpful for in-depth understanding of long videos, which often requires inferring hidden information from known information. In-depth relationship analysis on long videos is dedicated to constructing relationship graphs between two types of entities, scenes and characters. Through the entity relationship diagram, various questions about in-depth video analysis can be answered. [0003] Similar work on video understanding includes video induction, behavior recognition, visual relationship detection, and social relationship recognition, but these works are generally applicable to short videos, and lack in-depth analysis of relatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06T7/246G10L25/24G06F40/30G06F16/36G06F16/35

CPCG06F16/367G06T7/246G10L25/24G06F40/30G06F16/35G06T2207/30241G06V40/161G06V20/48G06V20/46G06V20/49G06F18/253

Inventor 任桐炜武港山于凡王丹丹张贝贝

Owner NANJING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video depth relation analysis method based on multi-modal feature fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology