A method and system for counteracting cross-modal retrieval based on dictionary learning

A dictionary learning and cross-modal technology, applied in the field of cross-modal retrieval, can solve the problems of not having the maximum correlation, ignoring the statistical characteristics of multi-modal data, and unable to maintain the inherent statistical characteristics of the original characteristics of the modality

Inactive Publication Date: 2019-02-01
SHANDONG NORMAL UNIV
View PDF4 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the above methods have some disadvantages: on the one hand, most of them learn a common representation space for different modal data, ignoring the complex statistical properties of multimodal data
On the other hand, the features projected to the common space cannot maintain the inherent statistical properties of the original features of each modality, and the features after projection do not have the maximum correlation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for counteracting cross-modal retrieval based on dictionary learning
  • A method and system for counteracting cross-modal retrieval based on dictionary learning
  • A method and system for counteracting cross-modal retrieval based on dictionary learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] This embodiment provides an adversarial cross-modal retrieval method based on dictionary learning, and the specific steps are as follows:

[0060] Step S1: Obtain the underlying features of images and texts, construct a data set including image modalities and text modalities and their semantic labels, and divide them into image training set, text training set, image test set and text test set.

[0061] The image training set is denoted as d v is the image feature dimension, and m is the number of samples. The text training set is denoted as d t is the text feature dimension, and m is the number of samples. X, Y are feature matrices. The image-text pairs in the training set are denoted as P={X,Y}. In the same way, we can divide the test set X of images and text te , Y te .

[0062] Taking the Wikipedia-CNN dataset as an example, the Wikipedia-CNN dataset contains 2866 image-text pairs and their corresponding semantic labels. 2173 image-text pairs are randomly...

Embodiment 2

[0104] The purpose of this embodiment is to provide a computing system.

[0105] A confrontational cross-modal retrieval system based on dictionary learning, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the following steps when executing the program, including:

[0106] Obtaining the underlying features of image data and text data, and constructing a training set and a test set of images and text respectively based on the underlying features;

[0107] Construct a dictionary learning model, train based on image and text training sets, and obtain image dictionaries, text dictionaries, image reconstruction coefficients and text reconstruction coefficients;

[0108] According to the image dictionary and the text dictionary, calculate the image reconstruction coefficient and the text reconstruction coefficient of the test set;

[0109] The image reconstruction coefficient and text reconstr...

Embodiment 3

[0114] The purpose of this embodiment is to provide a computer-readable storage medium.

[0115] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the following steps are performed:

[0116] Obtaining the underlying features of image data and text data, and constructing a training set and a test set of images and text respectively based on the underlying features;

[0117] Construct a dictionary learning model, train based on image and text training sets, and obtain image dictionaries, text dictionaries, image reconstruction coefficients and text reconstruction coefficients;

[0118] According to the image dictionary and the text dictionary, calculate the image reconstruction coefficient and the text reconstruction coefficient of the test set;

[0119] The image reconstruction coefficient and text reconstruction coefficient of the training set, and the transposed form of the image reconstruction coeffic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for antagonistic cross-modal retrieval based on dictionary learning. The method comprises the following steps: obtaining the bottom characteristics of image data and text data, and respectively constructing a training set and a test set of the image and text based on the bottom characteristics; Construct dictionary learning model, train based on imageand text training set, construct new training set and test set according to the obtained image dictionary and text dictionary; Projecting a training set of the new image and text onto a common representation space; According to the image and text feature data in the common representation space, the feature preserver is learned, that is, feature discrimination, triple sorting and learning modal classifier. Feature preserver and modal classifier are confronted to learn, the common representation space is optimized, and the cross-modal retrieval is carried out by test suite. The accuracy of cross-modal retrieval can be greatly improved by using dictionary learning to extract features and antagonistic learning to learn the common space of image modal and text modal.

Description

technical field [0001] The present invention relates to the fields of cross-modal retrieval and deep learning, and more specifically, relates to a method and system for adversarial cross-modal retrieval based on dictionary learning. Background technique [0002] With the rapid development of Internet technology, multi-modal data (such as text, image, audio and video) emerge in an endless stream, and the traditional single-modal retrieval can no longer meet the needs of users. Cross-modal retrieval is gradually becoming the mainstream of information retrieval because it can realize the fusion and supplement of multiple modal information. [0003] Because the underlying characteristics of multimodal data are different, they have complex organizational structures, and most of them are unstructured or semi-structured, so multimodal data is difficult to store or retrieve in a structured manner. In order to solve the heterogeneous gap between multimodal data, scholars at home and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/903G06K9/62
CPCG06F18/2413G06F18/24147
Inventor 张化祥尚菲李静刘丽孟丽丽谭艳艳王强
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products