Image-text cross-modal feature unentanglement method based on depth mutual information constraint

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A mutual information, cross-modal technology, applied in digital data information retrieval, special data processing applications, instruments, etc., can solve problems such as model performance degradation

Active Publication Date: 2020-02-18

ZHEJIANG UNIV +1

View PDF4 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Existing image-text cross-modal retrieval methods often map these two types of information into the learned feature representation at the same time, and the modality-specific information will degrade the performance of the model during the learning process.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0068] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0069] like figure 1 Shown, the implementation process of the present invention is as follows:

[0070] Step 1: Organize the text and images in the database into a prescribed data pattern.

[0071] The data mode is a sample composed of text, image and category label. In the process of reading, the sample class is first constructed, and the member variables are text data, image data and category label data. Next, the original data is read using Tools read in a specific format.

[0072] For an image file, the amount of corresponding text data can be one sentence, multiple sentences or a description, depending on the specific data set.

[0073] Taking the MSCOCO dataset as an example, each sample consists of an image, a text, and a label, expressed as and stored as a unit in the dataset.

[0074] Step 2: Using deep ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image-text cross-modal feature unentanglement method based on depth mutual information constraint. The method comprises the following steps: reading text and image files ina specified data mode; secondly, respectively extracting original features of the text and the image data by utilizing ResNet and BiGRU; then, under the effect of depth mutual information constraint,mapping the original features to a mixed feature space; finally, using the generative adversarial network to reconstruct the data to different degrees. By controlling the reconstruction process, the unentanglement of the cross-modal features is realized, so that the modal common information and the modal specific information are mapped to different feature spaces respectively. According to the method, the unentanglement features can be learned on large-scale image-text data, and only the features are unentangled, so that the retrieval accuracy is improved, and the depth features have better interpretability.

Description

technical field [0001] The invention belongs to the field of graphic-text cross-modal calculation, and in particular relates to a graphic-text cross-modal feature disentanglement method constrained by depth mutual information. Background technique [0002] Due to the rapid rise and development of social networks and short video platforms in recent years, multimedia data on the Internet has exploded. People urgently hope to find appropriate and effective methods to deal with these multimodal data. Cross-modal retrieval is the most basic and representative type of cross-modal data computing methods. [0003] The task of cross-modal information retrieval is that when people give data from one modality (such as images), the retrieval algorithm can return data from another modality (such as text modality) and query data through the processing and calculation of hardware devices. Related return results. However, there is large heterogeneity in data from different modalities. T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583

CPCG06F16/5846

Inventor 孔祥维郭维廓

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image-text cross-modal feature unentanglement method based on depth mutual information constraint

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology