Video-paragraph retrieval method and system based on local-whole graph reasoning network

A partial image and video clip technology, applied in the field of cross-modal retrieval, can solve the problems of long sequence direct coding performance degradation, etc., and achieve the best technical effect and the comprehensive effect of interactive information

Active Publication Date: 2021-09-17
HANGZHOU YIWISE INTELLIGENT TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problem of long-sequence direct coding performance degradation and learn more fine-grained information, the present invention decomposes video and text into four levels: overall level, segment layer, action layer and object layer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video-paragraph retrieval method and system based on local-whole graph reasoning network
  • Video-paragraph retrieval method and system based on local-whole graph reasoning network
  • Video-paragraph retrieval method and system based on local-whole graph reasoning network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings.

[0031] The present invention mainly designs four parts: firstly, video and text (paragraph) are preprocessed. Second, the given video and text are respectively encoded using a local-whole graph inference network to obtain the final video features and text features. After that, the similarity between video features and text features is calculated using cosine similarity. Finally, search is performed according to the similarity measurement results. In the local-whole graph reasoning network, the present invention proposes to decompose video and text into four-layer semantic structures, and construct a local graph and an overall graph respectively, and then use a graph convolutional network to perform graph reasoning operations.

[0032] Schematic diagram reference of the video-paragraph retrieval process of the present invention figure 1 As shown, it mainly includes t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video-paragraph retrieval method and system based on a partial-whole graph reasoning network, which belongs to the field of cross-modal retrieval and mainly includes the following steps: 1) Firstly, preprocessing the video and text (paragraph). 2) For the given video and text, use the local-whole graph inference network to encode, respectively, to obtain the final video features and text features. 3) Calculate the similarity between video features and text features using cosine similarity. 4) Retrieve according to the similarity measurement results. Compared with the traditional video-paragraph retrieval method, the present invention proposes to decompose video and text into four-layer semantic structures, and construct local graphs and overall graphs respectively, and then use graph convolutional networks to perform graph reasoning operations. The results obtained in video-paragraph retrieval are better than traditional methods.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval, in particular to a video-paragraph retrieval method and system based on a partial-whole graph reasoning network. Background technique [0002] As a cross-modal retrieval task between videos and paragraphs, the video-paragraph retrieval task is a very important task that has attracted the attention of many researchers. [0003] This task is designed in the two fields of computer vision and natural language processing. It requires the system to encode both video and text, and then calculate the similarity based on the encoding, and then perform retrieval. At present, the video-paragraph retrieval task is still a novel task, and the current research on this task is not mature enough. [0004] Existing video-paragraph retrieval tasks either directly encode the entire video and the entire paragraph, or directly encode only multiple segments of the video and paragraph. However, such encoding meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/783G06F40/30G06N5/04
CPCG06N5/04G06F16/7844G06F40/30
Inventor 张鹏程
Owner HANGZHOU YIWISE INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products