Video and text cross-modal retrieval method based on relational reasoning network

A cross-modal and relational technology, applied in the field of cross-modal retrieval of video and text, can solve the problem of ignoring the internal information relationship of a single modality, failing to extract information in the time domain well, and the expression of single-modal information is not complete and sufficiency, etc., to achieve a good cross-modal retrieval effect
CN113239159AActive Publication Date: 2021-08-10CHENGDU KOALA URAN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
CHENGDU KOALA URAN TECH CO LTD
Publication Date
2021-08-10

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to the field of cross-modal retrieval, and discloses a video and text cross-modal retrieval method based on a relational reasoning network, which comprises the following steps: extracting video data features and text data features; obtaining video global features and text global features by using a recurrent neural network; constructing video local relation features and text local relation features by using a multi-scale relation reasoning network; respectively fusing the global features and the local relation features of the single-mode data to obtain video fusion features and text fusion features; mapping the video fusion features and the text fusion features to a public space, and aligning video fusion feature distribution and text fusion feature distribution in the public space; and training the whole network. According to the method, global features and local relation features are concerned at the same time, key information in single-mode data can be more effectively focused, and then cross-mode retrieval is achieved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of cross-modal retrieval, in particular to a video and text cross-modal retrieval method based on a relational reasoning network. Background technique

[0002] Cross-media retrieval means that users can retrieve semantically related data in all media types by inputting query data of any media type. In the present invention, it is specifically the mutual retrieval of video and text. In general, videos and corresponding video description texts will be provided in the data set. The task of cross-media retrieval is: for any video, retrieve the video description text most relevant to its content description, or for any video description text, retrieve the The video most relevant to its description. With the increasing amount of multimedia data such as text, images, and videos on the Internet, retrieval across different modalities has become a new trend in information retrieval. The difficulty of this problem lies in how...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More