Video-paragraph retrieval method and system based on local-overall graph inference network

A partial graph and video segment technology, applied in the field of cross-modal retrieval, can solve the problems of long-sequence direct coding performance degradation, etc., and achieve optimal technical effects and comprehensive interactive information effects

Active Publication Date: 2021-08-03
HANGZHOU YIWISE INTELLIGENT TECH CO LTD
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problem of long-sequence direct coding performance degradation and learn more fine-grained information, the present invention decomposes video and text into four levels: overall level, segment layer, action layer and object layer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video-paragraph retrieval method and system based on local-overall graph inference network
  • Video-paragraph retrieval method and system based on local-overall graph inference network
  • Video-paragraph retrieval method and system based on local-overall graph inference network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings.

[0031] The present invention mainly designs four parts: first, preprocess the video and text (paragraph). Second, a local-holistic graph inference network is used to encode the given video and text, respectively, to obtain the final video features and text features. After that, the cosine similarity is used to calculate the similarity between video features and text features. Finally, retrieval is performed according to the similarity measurement results. In the partial-overall graph reasoning network, the present invention proposes to decompose the video and text into four-layer semantic structures, and construct the partial graph and the overall graph respectively, and then use the graph convolutional network to perform graph reasoning operations.

[0032] Reference to the schematic diagram of the video-paragraph retrieval process of the present invention figure 1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video-paragraph retrieval method and a video-paragraph retrieval system based on a local-overall graph inference network, which belong to the field of cross-modal retrieval. The method mainly comprise the following steps of: 1) firstly preprocessing videos and texts (paragraph); 2) encoding given videos and texts by using a local-overall graph inference network to obtain final video features and text features; 3) calculating the similarity between the video features and the text features by using cosine similarity; 4) performing retrieval according to a similarity measurement result. Compared with a traditional video-paragraph retrieval method, the video-paragraph retrieval method has the advantages that the video and the text are decomposed into four layers of semantic structures respectively, the four layers of semantic structures are constructed into the local graph and the overall graph respectively, then the graph convolution network is used for conducting graph reasoning operation, and the result obtained in video-paragraph retrieval is better than that obtained in the traditional method.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval, in particular to a video-paragraph retrieval method and system based on a partial-overall graph reasoning network. Background technique [0002] As a cross-modal retrieval task between videos and paragraphs, the video-paragraph retrieval task is a very important task that has attracted the attention of many researchers. [0003] This task is designed in the two fields of computer vision and natural language processing, and requires the system to encode both video and text, and then calculate the similarity according to the encoding, and then perform retrieval. At present, the video-paragraph retrieval task is still a novel task, and the current research on this task is not mature enough. [0004] Existing video-paragraph retrieval tasks either directly encode the entire video and the entire paragraph, or only consider multiple segments of the video and paragraph. Then it is difficult to get...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/783G06F40/30G06N5/04
CPCG06N5/04G06F16/7844G06F40/30
Inventor 张鹏程
Owner HANGZHOU YIWISE INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products