Cross-modal retrieval method based on modal relation learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cross-modal and modal technology, applied in the field of cross-modal retrieval based on modal relationship learning, can solve the problem of inconvenient retrieval of useful information, achieve good mutual retrieval performance of images and texts, and improve retrieval accuracy.

Inactive Publication Date: 2022-07-29

HUAQIAO UNIVERSITY

View PDF0 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] In recent years, different modal data such as images and texts widely exist in people's Internet life. The traditional single-modal retrieval can no longer meet the growing retrieval needs of users, and it is useful for people to retrieve massive Internet data between different modalities. Information is inconvenient, so cross-modal retrieval becomes an important research problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0052] The present invention proposes a cross-modal retrieval method based on modal relationship learning. By constructing a cross-modal-specific multi-modal deep learning network, a dual fusion mechanism between modalities and intra-modalities is established to perform intermodal relationship learning. Not only the multi-scale features are fused within the modality, but also the relationship information of the labels is used between the modalities to directly learn the complementary relationship of the fused features. In addition, the attention mechanism between the modalities is added for joint feature embedding, so that the fused features Retaining as much inter-modal invariance and intra-modal discriminative as possible further improves the retrieval performance across modalities.

[0053] see figure 1 As shown, the present invention is a cross-modal retrieval method based on modal relationship learning. The model includes a training process and a retrieval process. Specif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a cross-modal retrieval method based on modal relation learning, which comprises the following steps of: inputting image text pairs with the same semantics in a data set and class labels to which the image text pairs belong into a cross-modal retrieval network model based on modal relation learning for training until the model converges, thereby obtaining a network model M; s2, respectively extracting feature vectors of an image / text to be queried and each text / image in the candidate library by utilizing the network model M obtained by training in S1, thereby calculating the similarity between the image / text to be queried and the text / image in the candidate library, carrying out descending sorting according to the similarity, and returning a retrieval result with the highest similarity; an inter-modal and intra-modal dual fusion mechanism is established for inter-modal relation learning, multi-scale features are fused in the modals, complementary relation learning is directly performed on the fused features by using label relation information between the modals, and in addition, an inter-modal attention mechanism is added for feature joint embedding, so that multi-scale multi-scale feature fusion is realized. And the cross-modal retrieval performance is further improved.

Description

technical field [0001] The invention relates to the fields of multimodal learning and information retrieval, in particular to a cross-modal retrieval method based on modal relationship learning. Background technique [0002] In recent years, different modal data such as images and texts have widely existed in people's Internet life. The traditional single-modal retrieval can no longer meet the increasing retrieval needs of users. It is useful for people to retrieve data between different modalities on the Internet Information brings inconvenience, so cross-modal retrieval becomes an important research problem. It aims to retrieve data between different modalities (image, text, voice, video, etc.), such as image retrieval text, text retrieval audio, audio retrieval video, etc., cross-modal retrieval in medical data analysis, big data management, It is widely used in public opinion detection and other fields. [0003] Modal data generally has the characteristics of low-level...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/908G06F16/906G06V10/764G06V10/80

CPCG06F16/908G06V10/80G06V10/765G06F16/906

Inventor 曾焕强王欣唯朱建清陈婧黄德天温廷羲郭荣新

Owner HUAQIAO UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal retrieval method based on modal relation learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology