Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Picture text cross-modal retrieval method based on self-supervised adversarial

A cross-modal, text technology, applied in the field of image and text cross-modal retrieval based on self-supervised confrontation, can solve the problem of affecting the cross-modal retrieval effect of icon text, the failure of samples to be successfully discriminated by the discriminator, and hindering cross-modal retrieval, etc. question

Active Publication Date: 2021-03-12
GUIZHOU UNIV
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Adversarial training alternately trains the generator and the discriminator so that the samples generated by the generator cannot be successfully discriminated by the discriminator
[0004] Due to the large gap between image features and text features, it is difficult to map the features of the two modalities into the same space for retrieval in traditional image-text cross-modal retrieval methods, which hinders efficient cross-modal retrieval
In particular, there is redundant information in image features, and different modal features cannot be well mapped to the same space, which affects the effect of icon text cross-modal retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Picture text cross-modal retrieval method based on self-supervised adversarial
  • Picture text cross-modal retrieval method based on self-supervised adversarial
  • Picture text cross-modal retrieval method based on self-supervised adversarial

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

[0051] figure 1 It is a flow chart of a specific embodiment of the self-supervised confrontation-based image-text cross-modal retrieval method of the present invention.

[0052] In this example, if figure 1 As shown, the present invention is based on self-supervised confrontation image text cross-modal retrieval method, it is characterized in that, comprises the following steps:

[0053] Step S1: Build the Embedding Generative Network

[0054] In this example, if figure 2 As shown, two independent single-layer feedforward neural networks are constructed as image networ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a picture text cross-modal retrieval method based on self-supervised adversarial, which comprises the following steps of: constructing a shared network in an embedded generation network to serve as picture features and text features to be respectively mapped into shared subspaces of picture codes and text codes, so that the separation between the picture features and the text features is reduced, and meanwhile, constructing a self-supervised adversarial network, performing adversarial training under the assistance of the self-supervised adversarial network to embed a generative network, filtering redundant information in picture features, so that different modal features are better mapped to the same space, the different modal features can be better aligned, and therefore the cross-modal retrieval effect of picture texts is improved.

Description

technical field [0001] The invention belongs to the technical field of image text cross-modal retrieval, and more specifically, relates to a self-supervised confrontation-based image text cross-modal retrieval method. Background technique [0002] With the development of cloud computing, Internet of Things, mobile phones, social media and other information technologies, the data on the Internet is growing explosively, and the era of big data has arrived. In the era of big data, how to perform fast image text cross-modal retrieval is the focus of people's attention. [0003] Self-supervised learning uses the labels of its own attributes to train the model to obtain the features of the picture, avoiding expensive manual labeling, but the learned features are often only related to the visual information of the picture. Adversarial training alternately trains the generator and the discriminator, so that the samples generated by the generator cannot be successfully discriminated...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/46G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06V10/44G06N3/044G06N3/045G06F18/213
Inventor 杨阳何仕远王阳超
Owner GUIZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products