Zero-sample cross-modal retrieval method combining automatic encoder and generative adversarial network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An auto-encoder, cross-modal technology, applied in the fields of digital data information retrieval, instruments, special data processing applications, etc., can solve the problem of not considering the adaptability of retrieval tasks, and achieve the effect of stable training and effective knowledge transfer.

Pending Publication Date: 2020-11-20

CHENGDU KOALA URAN TECH CO LTD

View PDF7 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Some recently proposed zero-shot cross-modal retrieval methods usually directly apply zero-shot learning methods to the field of multimodal retrieval, and the model contains many parts that are not related to the retrieval task, and does not consider the adaptability of these methods to the retrieval task

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0067] The present invention is a zero-sample cross-modal retrieval method that combines an autoencoder and a generative adversarial network. It first extracts the features used for training, and then constructs an overall model, which is used for cross-modal retrieval after training. In this embodiment, the main It includes step S1-step S6.

[0068] Step S1: Use the pre-trained model to extract features of each modality.

[0069] This embodiment contains data in three modalities, namely image, text and category label. Their raw data is represented in a way that humans can understand, but computers don't understand enough to process raw data. Therefore, use pre-trained models to extract features that computers can process and understand from raw data.

[0070] For image data, this embodiment uses the VGG-16 model to extract 4096-dimensional image features. For text data, this embodiment uses the Doc2Vec model to extract 300-dimensional text features. For category label dat...

Embodiment 2

[0114] In this embodiment, on the basis of embodiment 1, experimental verification is carried out. In this embodiment, four mainstream datasets in the field of cross-modal retrieval are used as training and testing datasets, namely Wikipedia, Pascal Sentence, NUS-WIDE, and PKU-XMedieaNet. They both contain image modality data, text modality data and category labels for image-text retrieval tasks. In the experiment, this embodiment adopts the average value of the average correct rate (MAP) as the evaluation standard, checks the performance of embodiment 3 on the image-to-text retrieval task and the text-to-image retrieval task, and reports their average as the final The performance evaluation of , reflects the retrieval performance of the overall model.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a zero-sample cross-modal retrieval method combining an automatic encoder and a generative adversarial network, and belongs to the field of cross-modal retrieval in computer vision. The method comprises the following steps: extracting features of each mode by using a pre-trained model; constructing a corresponding encoder for the feature of each mode, generating a corresponding low-dimensional potential embedded representation, and performing cross-distribution alignment on the potential embedded representation; constructing a corresponding decoder for each encoder, andreconstructing the original features of each mode from the low-dimensional potential embedded representation; constructing a corresponding discriminator, evaluating whether the feature distribution generated by the discriminator is consistent with the real feature distribution or not, if so, training the whole network in combination with an automatic encoder and a generative adversarial network;and zero-sample cross-modal retrieval is carried out in a low-dimensional potential embedding space. According to the method, zero-sample cross-modal retrieval can be realized.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval in computer vision, in particular to a zero-sample cross-modal retrieval method combined with an autoencoder and a generated confrontation network. Background technique [0002] With the rapid development of Internet technology, multimodal data such as images, texts, videos, and audios have exploded. Due to the cross-modal correlation among different modalities, cross-modal retrieval has become a research hotspot. The basic task of cross-modal retrieval is to use query data on any modality to retrieve data from other modalities, for example, text image retrieval, image sketch retrieval and video retrieval. [0003] However, cross-modal retrieval faces a major problem called the "heterogeneous gap", that is, the data distribution of the querying modality and the queried modality are inconsistent, so it is difficult to establish the relationship between the modalities, and it is difficult to me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/9532G06F16/432

CPCG06F16/9532G06F16/432Y02D10/00

Inventor 徐行田加林沈复民邵杰申恒涛

Owner CHENGDU KOALA URAN TECH CO LTD

Zero-sample cross-modal retrieval method combining automatic encoder and generative adversarial network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology