Unsupervised cross-modal retrieval method based on attention mechanism enhancement

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention-based, cross-modal technology, applied in the field of artificial intelligence smart community applications, can solve the problems of indirect increase in the heterogeneous semantic gap, unequal semantic information, failure to retrieve data of different modalities, etc., to achieve rich visual semantics Information, Robust Representation, Effects of Closing the Semantic Gap

Active Publication Date: 2022-01-25

QINGDAO SONLI SOFTWARE INFORMATION TECH

View PDF9 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, a non-negligible problem with these methods is that the semantic information obtained from images and texts is not equal, which increases the indirectness of the heterogeneous semantic gap between different modalities, which leads to the failure of retrieving data of different modalities.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0034] The workflow of the embodiment of the present invention is as figure 1 As shown, it mainly includes the following seven parts:

[0035] (1) Preprocess the image data and text data, and change the size of the image data to 224 224, cut the picture into nine pieces; for text data, turn it into a word vector of the corresponding dimension;

[0036] (2) Perform feature extraction on the image and text data processed in step (1), input the processed image into the attention mechanism network, use the self-attention module to perform feature extraction, obtain image features, and form image feature vectors Collection; text data uses linear layer for further feature extraction to form a collection of text feature vectors;

[0037] (3) Input the image and text feature vector sets extracted in step (2) into the multimodal feature fusion module, that is, first fuse the extracted image and text feature vector sets on the 512-dimensional intermediate dimension to obtain multimod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of artificial intelligence smart community application, and relates to an unsupervised cross-modal retrieval method based on attention mechanism enhancement, which comprises the following steps of: enhancing visual semantic features of an image, then aggregating feature information of different modals, mapping fused multi-modal features to the same semantic feature space, then, on the basis of a generative adversarial network, adversarial learning is carried out on the image modal and text modal features and the same semantic feature after multi-modal fusion, aligning semantic features of different modals, and finally, generating hash codes by the different modal features after alignment of the generative adversarial network; and performing intra-modal feature and Hash code similarity measurement learning and inter-modal feature and Hash code similarity measurement, so a heterogeneous semantic gap problem between different modalities is reduced, a dependency relationship between different modal features is enhanced, a semantic gap between different modal data is reduced, and semantic common characteristics among different modes can be represented more robustly.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence smart community applications, and relates to an unsupervised cross-modal retrieval method based on attention mechanism enhancement, which can effectively handle cross-modal retrieval between large-scale images and texts in a smart community. Background technique [0002] Cross-modal retrieval is to use data from one modality to search for related data in another modality, for example, use a text description to retrieve images related to the text description in the image database. This technology is often used in daily life, such as Baidu search map, Taobao shopping and so on. Traditional cross-modal retrieval is divided into supervised cross-modal retrieval and unsupervised cross-modal retrieval. Due to the remarkable effect of deep neural network in the field of computer vision, deep cross-modal retrieval has become the mainstream of current research. With the rapid development ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/33G06F16/31G06F16/583G06F16/58G06F16/51G06V10/74G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06F16/334G06F16/325G06F16/583G06F16/5866G06F16/51G06N3/08G06N3/045G06F18/22G06F18/253

Inventor 刘寒松王永王国强刘瑞翟贵乾

Owner QINGDAO SONLI SOFTWARE INFORMATION TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unsupervised cross-modal retrieval method based on attention mechanism enhancement

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology