Unsupervised cross-modal retrieval method based on attention mechanism enhancement

An attention-based, cross-modal technology, applied in the field of artificial intelligence smart community applications, can solve the problems of indirect increase in the heterogeneous semantic gap, unequal semantic information, failure to retrieve data of different modalities, etc., to achieve rich visual semantics Information, Robust Representation, Effects of Closing the Semantic Gap

Active Publication Date: 2022-01-25
QINGDAO SONLI SOFTWARE INFORMATION TECH
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, a non-negligible problem with these methods is that the semantic information obtained from images and texts is not equal, which increases the indirectness of the heterogeneous semantic gap between different modalities, which leads to the failure of retrieving data of different modalities.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised cross-modal retrieval method based on attention mechanism enhancement
  • Unsupervised cross-modal retrieval method based on attention mechanism enhancement
  • Unsupervised cross-modal retrieval method based on attention mechanism enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] The workflow of the embodiment of the present invention is as figure 1 As shown, it mainly includes the following seven parts:

[0035] (1) Preprocess the image data and text data, and change the size of the image data to 224 224, cut the picture into nine pieces; for text data, turn it into a word vector of the corresponding dimension;

[0036] (2) Perform feature extraction on the image and text data processed in step (1), input the processed image into the attention mechanism network, use the self-attention module to perform feature extraction, obtain image features, and form image feature vectors Collection; text data uses linear layer for further feature extraction to form a collection of text feature vectors;

[0037] (3) Input the image and text feature vector sets extracted in step (2) into the multimodal feature fusion module, that is, first fuse the extracted image and text feature vector sets on the 512-dimensional intermediate dimension to obtain multimod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of artificial intelligence smart community application, and relates to an unsupervised cross-modal retrieval method based on attention mechanism enhancement, which comprises the following steps of: enhancing visual semantic features of an image, then aggregating feature information of different modals, mapping fused multi-modal features to the same semantic feature space, then, on the basis of a generative adversarial network, adversarial learning is carried out on the image modal and text modal features and the same semantic feature after multi-modal fusion, aligning semantic features of different modals, and finally, generating hash codes by the different modal features after alignment of the generative adversarial network; and performing intra-modal feature and Hash code similarity measurement learning and inter-modal feature and Hash code similarity measurement, so a heterogeneous semantic gap problem between different modalities is reduced, a dependency relationship between different modal features is enhanced, a semantic gap between different modal data is reduced, and semantic common characteristics among different modes can be represented more robustly.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence smart community applications, and relates to an unsupervised cross-modal retrieval method based on attention mechanism enhancement, which can effectively handle cross-modal retrieval between large-scale images and texts in a smart community. Background technique [0002] Cross-modal retrieval is to use data from one modality to search for related data in another modality, for example, use a text description to retrieve images related to the text description in the image database. This technology is often used in daily life, such as Baidu search map, Taobao shopping and so on. Traditional cross-modal retrieval is divided into supervised cross-modal retrieval and unsupervised cross-modal retrieval. Due to the remarkable effect of deep neural network in the field of computer vision, deep cross-modal retrieval has become the mainstream of current research. With the rapid development ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/31G06F16/583G06F16/58G06F16/51G06V10/74G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06F16/334G06F16/325G06F16/583G06F16/5866G06F16/51G06N3/08G06N3/045G06F18/22G06F18/253
Inventor 刘寒松王永王国强刘瑞翟贵乾
Owner QINGDAO SONLI SOFTWARE INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products