Cross-modal retrieval method based on modal relation learning

A cross-modal and modal technology, applied in the field of cross-modal retrieval based on modal relationship learning, can solve the problem of inconvenient retrieval of useful information, achieve good mutual retrieval performance of images and texts, and improve retrieval accuracy.

Inactive Publication Date: 2022-07-29
HUAQIAO UNIVERSITY
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In recent years, different modal data such as images and texts widely exist in people's Internet life. The traditional single-modal retrieval can no longer meet the growing retrieval ne...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal retrieval method based on modal relation learning
  • Cross-modal retrieval method based on modal relation learning
  • Cross-modal retrieval method based on modal relation learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The present invention proposes a cross-modal retrieval method based on modal relationship learning. By constructing a cross-modal-specific multi-modal deep learning network, a dual fusion mechanism between modalities and intra-modalities is established to perform intermodal relationship learning. Not only the multi-scale features are fused within the modality, but also the relationship information of the labels is used between the modalities to directly learn the complementary relationship of the fused features. In addition, the attention mechanism between the modalities is added for joint feature embedding, so that the fused features Retaining as much inter-modal invariance and intra-modal discriminative as possible further improves the retrieval performance across modalities.

[0053] see figure 1 As shown, the present invention is a cross-modal retrieval method based on modal relationship learning. The model includes a training process and a retrieval process. Specif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a cross-modal retrieval method based on modal relation learning, which comprises the following steps of: inputting image text pairs with the same semantics in a data set and class labels to which the image text pairs belong into a cross-modal retrieval network model based on modal relation learning for training until the model converges, thereby obtaining a network model M; s2, respectively extracting feature vectors of an image/text to be queried and each text/image in the candidate library by utilizing the network model M obtained by training in S1, thereby calculating the similarity between the image/text to be queried and the text/image in the candidate library, carrying out descending sorting according to the similarity, and returning a retrieval result with the highest similarity; an inter-modal and intra-modal dual fusion mechanism is established for inter-modal relation learning, multi-scale features are fused in the modals, complementary relation learning is directly performed on the fused features by using label relation information between the modals, and in addition, an inter-modal attention mechanism is added for feature joint embedding, so that multi-scale multi-scale feature fusion is realized. And the cross-modal retrieval performance is further improved.

Description

technical field [0001] The invention relates to the fields of multimodal learning and information retrieval, in particular to a cross-modal retrieval method based on modal relationship learning. Background technique [0002] In recent years, different modal data such as images and texts have widely existed in people's Internet life. The traditional single-modal retrieval can no longer meet the increasing retrieval needs of users. It is useful for people to retrieve data between different modalities on the Internet Information brings inconvenience, so cross-modal retrieval becomes an important research problem. It aims to retrieve data between different modalities (image, text, voice, video, etc.), such as image retrieval text, text retrieval audio, audio retrieval video, etc., cross-modal retrieval in medical data analysis, big data management, It is widely used in public opinion detection and other fields. [0003] Modal data generally has the characteristics of low-level...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/908G06F16/906G06V10/764G06V10/80
CPCG06F16/908G06V10/80G06V10/765G06F16/906
Inventor 曾焕强王欣唯朱建清陈婧黄德天温廷羲郭荣新
Owner HUAQIAO UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products