Video-paragraph retrieval method and system based on local-overall graph inference network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A partial graph and video segment technology, applied in the field of cross-modal retrieval, can solve the problems of long-sequence direct coding performance degradation, etc., and achieve optimal technical effects and comprehensive interactive information effects

Active Publication Date: 2021-08-03

HANGZHOU YIWISE INTELLIGENT TECH CO LTD

View PDF9 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problem of long-sequence direct coding performance degradation and learn more fine-grained information, the present invention decomposes video and text into four levels: overall level, segment layer, action layer and object layer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings.

[0031] The present invention mainly designs four parts: first, preprocess the video and text (paragraph). Second, a local-holistic graph inference network is used to encode the given video and text, respectively, to obtain the final video features and text features. After that, the cosine similarity is used to calculate the similarity between video features and text features. Finally, retrieval is performed according to the similarity measurement results. In the partial-overall graph reasoning network, the present invention proposes to decompose the video and text into four-layer semantic structures, and construct the partial graph and the overall graph respectively, and then use the graph convolutional network to perform graph reasoning operations.

[0032] Reference to the schematic diagram of the video-paragraph retrieval process of the present invention figure 1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video-paragraph retrieval method and a video-paragraph retrieval system based on a local-overall graph inference network, which belong to the field of cross-modal retrieval. The method mainly comprise the following steps of: 1) firstly preprocessing videos and texts (paragraph); 2) encoding given videos and texts by using a local-overall graph inference network to obtain final video features and text features; 3) calculating the similarity between the video features and the text features by using cosine similarity; 4) performing retrieval according to a similarity measurement result. Compared with a traditional video-paragraph retrieval method, the video-paragraph retrieval method has the advantages that the video and the text are decomposed into four layers of semantic structures respectively, the four layers of semantic structures are constructed into the local graph and the overall graph respectively, then the graph convolution network is used for conducting graph reasoning operation, and the result obtained in video-paragraph retrieval is better than that obtained in the traditional method.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval, in particular to a video-paragraph retrieval method and system based on a partial-overall graph reasoning network. Background technique [0002] As a cross-modal retrieval task between videos and paragraphs, the video-paragraph retrieval task is a very important task that has attracted the attention of many researchers. [0003] This task is designed in the two fields of computer vision and natural language processing, and requires the system to encode both video and text, and then calculate the similarity according to the encoding, and then perform retrieval. At present, the video-paragraph retrieval task is still a novel task, and the current research on this task is not mature enough. [0004] Existing video-paragraph retrieval tasks either directly encode the entire video and the entire paragraph, or only consider multiple segments of the video and paragraph. Then it is difficult to get...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/783G06F40/30G06N5/04

CPCG06N5/04G06F16/7844G06F40/30

Inventor 张鹏程

Owner HANGZHOU YIWISE INTELLIGENT TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video-paragraph retrieval method and system based on local-overall graph inference network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology