Video-paragraph retrieval method and system based on local-overall graph inference network

A partial graph and video segment technology, applied in the field of cross-modal retrieval, can solve the problems of long-sequence direct coding performance degradation, etc., and achieve optimal technical effects and comprehensive interactive information effects
CN113204674AActive Publication Date: 2021-08-03HANGZHOU YIWISE INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU YIWISE INTELLIGENT TECH CO LTD
Publication Date
2021-08-03

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a video-paragraph retrieval method and a video-paragraph retrieval system based on a local-overall graph inference network, which belong to the field of cross-modal retrieval. The method mainly comprise the following steps of: 1) firstly preprocessing videos and texts (paragraph); 2) encoding given videos and texts by using a local-overall graph inference network to obtain final video features and text features; 3) calculating the similarity between the video features and the text features by using cosine similarity; 4) performing retrieval according to a similarity measurement result. Compared with a traditional video-paragraph retrieval method, the video-paragraph retrieval method has the advantages that the video and the text are decomposed into four layers of semantic structures respectively, the four layers of semantic structures are constructed into the local graph and the overall graph respectively, then the graph convolution network is used for conducting graph reasoning operation, and the result obtained in video-paragraph retrieval is better than that obtained in the traditional method.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of cross-modal retrieval, in particular to a video-paragraph retrieval method and system based on a partial-overall graph reasoning network. Background technique

[0002] As a cross-modal retrieval task between videos and paragraphs, the video-paragraph retrieval task is a very important task that has attracted the attention of many researchers.

[0003] This task is designed in the two fields of computer vision and natural language processing, and requires the system to encode both video and text, and then calculate the similarity according to the encoding, and then perform retrieval. At present, the video-paragraph retrieval task is still a novel task, and the current research on this task is not mature enough.

[0004] Existing video-paragraph retrieval tasks either directly encode the entire video and the entire paragraph, or only consider multiple segments of the video and paragraph. Then it is difficult to get...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More