A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A learning method and attention technology, applied in the fields of multi-modal learning and information retrieval, can solve the problems of low semantic correlation and poor internal correlation, and achieve the effect of improving retrieval accuracy and good image-text mutual retrieval performance

Active Publication Date: 2022-07-01

HUAQIAO UNIVERSITY +3

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, multi-modal data has the characteristics of low-level feature heterogeneity and high-level semantic correlation. Therefore, the shortcomings of the existing cross-modal retrieval technology are: in the case of inconsistent data representation, the internal correlation between different modalities is poor. The degree of semantic association is low; the cross-modal similarity measurement of the existing technology faces great challenges

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0055] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0056] The deep-supervised cross-modal adversarial learning method based on the attention mechanism of the present invention, such as figure 1 , figure 2 shown, including the training process and retrieval process, as follows:

[0057] 1) Training process: Input the paired first-type objects, second-type objects and their class label information with the same semantics in the dataset D into the attention mechanism-based deep supervised adversarial network model for training, until the model Convergence to obtain the network model M. The first type of object is an image and the second type of object is text, or the first type of object is text and the second type of object is an image.

[0058] The training process is as follows:

[0059] 1.1) the data of the first type objects of different categories are input into the feature extraction ne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a deep-supervised cross-modal confrontation learning method based on an attention mechanism. A deep learning network is constructed for each mode to obtain deep features, a generative confrontation network is introduced, and the cross-discrimination between modes is used to continuously refine with the help of an attention mechanism. The generated features of the modal feature network use the label information to perform deep supervised learning on the modal data in the label space while measuring the heterogeneous data in the common subspace. The network thus constructed enables the trained cross-modal depth-supervised adversarial model based on the attention mechanism to have good image and text mutual retrieval performance; in the retrieval process, the trained network model M is used to treat the query image (text) and candidate database. Feature extraction and cosine distance calculation are performed on the text (image) in the database, so as to obtain the image (text) to be queried and the text (image) data in the candidate database with higher similarity, and realize cross-modal retrieval.

Description

technical field [0001] The invention relates to the technical field of multimodal learning and information retrieval, and more particularly, to a deep-supervised cross-modal confrontation learning method based on an attention mechanism. Background technique [0002] Different modal data such as images and texts widely exist in the Internet. However, the problem of "heterogeneous gap" leads to inconsistent data distribution and representation of different modalities, and it is difficult to achieve semantic correlation, which is a problem for users in the massive Internet data between different modalities. It is inconvenient to retrieve useful information. In the prior art, cross-modal retrieval can be used to retrieve data in different modalities (image, text, voice, video, etc.), such as retrieving text through images, retrieving audio through text, retrieving video through audio, and so on. Cross-modal retrieval is commonly used in search engines and big data management. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/55G06F16/583G06F16/35G06F16/33G06K9/62G06N3/04G06N3/08

CPCG06F16/55G06F16/583G06F16/3344G06F16/35G06N3/084G06N3/045G06F18/241

Inventor 曾焕强王欣唯朱建清廖昀刘青松陈虢

Owner HUAQIAO UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Deeply Supervised Cross-modal Adversarial Learning Method Based on Attention Mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology