A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism

A learning method and attention technology, applied in the fields of multi-modal learning and information retrieval, can solve the problems of low semantic correlation and poor internal correlation, and achieve the effect of improving retrieval accuracy and good image-text mutual retrieval performance

Active Publication Date: 2022-07-01
HUAQIAO UNIVERSITY +3
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, multi-modal data has the characteristics of low-level feature heterogeneity and high-level semantic correlation. Therefore, the shortcomings of the existing cross-modal retrieval technology are: in the case of inconsistent data representation, the internal correlation between different modalities is poor. The degree of semantic association is low; the cross-modal similarity measurement of the existing technology faces great challenges

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism
  • A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism
  • A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0056] The deep-supervised cross-modal adversarial learning method based on the attention mechanism of the present invention, such as figure 1 , figure 2 shown, including the training process and retrieval process, as follows:

[0057] 1) Training process: Input the paired first-type objects, second-type objects and their class label information with the same semantics in the dataset D into the attention mechanism-based deep supervised adversarial network model for training, until the model Convergence to obtain the network model M. The first type of object is an image and the second type of object is text, or the first type of object is text and the second type of object is an image.

[0058] The training process is as follows:

[0059] 1.1) the data of the first type objects of different categories are input into the feature extraction ne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a deep-supervised cross-modal confrontation learning method based on an attention mechanism. A deep learning network is constructed for each mode to obtain deep features, a generative confrontation network is introduced, and the cross-discrimination between modes is used to continuously refine with the help of an attention mechanism. The generated features of the modal feature network use the label information to perform deep supervised learning on the modal data in the label space while measuring the heterogeneous data in the common subspace. The network thus constructed enables the trained cross-modal depth-supervised adversarial model based on the attention mechanism to have good image and text mutual retrieval performance; in the retrieval process, the trained network model M is used to treat the query image (text) and candidate database. Feature extraction and cosine distance calculation are performed on the text (image) in the database, so as to obtain the image (text) to be queried and the text (image) data in the candidate database with higher similarity, and realize cross-modal retrieval.

Description

technical field [0001] The invention relates to the technical field of multimodal learning and information retrieval, and more particularly, to a deep-supervised cross-modal confrontation learning method based on an attention mechanism. Background technique [0002] Different modal data such as images and texts widely exist in the Internet. However, the problem of "heterogeneous gap" leads to inconsistent data distribution and representation of different modalities, and it is difficult to achieve semantic correlation, which is a problem for users in the massive Internet data between different modalities. It is inconvenient to retrieve useful information. In the prior art, cross-modal retrieval can be used to retrieve data in different modalities (image, text, voice, video, etc.), such as retrieving text through images, retrieving audio through text, retrieving video through audio, and so on. Cross-modal retrieval is commonly used in search engines and big data management. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/55G06F16/583G06F16/35G06F16/33G06K9/62G06N3/04G06N3/08
CPCG06F16/55G06F16/583G06F16/3344G06F16/35G06N3/084G06N3/045G06F18/241
Inventor 曾焕强王欣唯朱建清廖昀刘青松陈虢
Owner HUAQIAO UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products