Video-paragraph retrieval method and system based on local-whole graph reasoning network

A partial image and video clip technology, applied in the field of cross-modal retrieval, can solve the problems of long sequence direct coding performance degradation, etc., and achieve the best technical effect and the comprehensive effect of interactive information
CN113204674BActive Publication Date: 2021-09-17HANGZHOU YIWISE INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU YIWISE INTELLIGENT TECH CO LTD
Publication Date
2021-09-17

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a video-paragraph retrieval method and system based on a partial-whole graph reasoning network, which belongs to the field of cross-modal retrieval and mainly includes the following steps: 1) Firstly, preprocessing the video and text (paragraph). 2) For the given video and text, use the local-whole graph inference network to encode, respectively, to obtain the final video features and text features. 3) Calculate the similarity between video features and text features using cosine similarity. 4) Retrieve according to the similarity measurement results. Compared with the traditional video-paragraph retrieval method, the present invention proposes to decompose video and text into four-layer semantic structures, and construct local graphs and overall graphs respectively, and then use graph convolutional networks to perform graph reasoning operations. The results obtained in video-paragraph retrieval are better than traditional methods.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of cross-modal retrieval, in particular to a video-paragraph retrieval method and system based on a partial-whole graph reasoning network. Background technique

[0002] As a cross-modal retrieval task between videos and paragraphs, the video-paragraph retrieval task is a very important task that has attracted the attention of many researchers.

[0003] This task is designed in the two fields of computer vision and natural language processing. It requires the system to encode both video and text, and then calculate the similarity based on the encoding, and then perform retrieval. At present, the video-paragraph retrieval task is still a novel task, and the current research on this task is not mature enough.

[0004] Existing video-paragraph retrieval tasks either directly encode the entire video and the entire paragraph, or directly encode only multiple segments of the video and paragraph. However, such encoding meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More