Cross-modal retrieval method and system based on pseudo label learning and semantic consistency

A consistent and cross-modal technology, applied in the field of cross-modal retrieval, can solve problems such as difficult to obtain the best performance, and does not consider unlabeled data, etc., to achieve good retrieval effect, difficult to obtain, and easy cost

Active Publication Date: 2019-05-21
SHANDONG JIANZHU UNIV
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The various methods described above either do not consider unlabeled data, or only learn a set of projections for each text/image retrieval task. Both text retrieval images and image retrieval texts are b

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal retrieval method and system based on pseudo label learning and semantic consistency
  • Cross-modal retrieval method and system based on pseudo label learning and semantic consistency
  • Cross-modal retrieval method and system based on pseudo label learning and semantic consistency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] In multimodal data retrieval, the similarity between different modal data cannot be directly measured. To relate data from one modality to data from other modalities, we learn a projection matrix using both labeled and unlabeled data. This embodiment discloses a cross-modal retrieval method based on pseudo-label learning and semantic consistency, such as figure 2 shown, including the following steps:

[0056] Step 1: Receive an image dataset and a text dataset, which includes labeled image and text pairs and unlabeled image data;

[0057] Image-text pairs in the training set have special semantic information called class labels. This semantic information can be used as the third dimension of the learned subspace and used to obtain a similarity measure between semantically similar but different modality data in the shared subspace. This embodiment also utilizes class labels to obtain a better similarity measure between data points. Unlike previous methods, the dimens...

Embodiment 2

[0132] The purpose of this embodiment is to provide a computer system.

[0133] A computer system, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, the following steps are implemented, including:

[0134] Receive an image dataset and a text dataset, which includes labeled image-text pairs as well as unlabeled image data;

[0135] learning a projection matrix from image space to text space, projecting said unlabeled image data to text space;

[0136] Calculate the class center of the labeled text;

[0137] According to the similarity between the projection data of the unlabeled image data and the class center of the text data, assign pseudo-labels to these image data, and use the text data corresponding to the class center closest to it as the corresponding text modality;

[0138] Use the labeled and assigned pseudo-labeled image data, as well as the corresponding text data as...

Embodiment 3

[0141] The purpose of this embodiment is to provide a computer-readable storage medium.

[0142] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the following steps are performed:

[0143] Receive an image dataset and a text dataset, which includes labeled image-text pairs as well as unlabeled image data;

[0144] learning a projection matrix from image space to text space, projecting said unlabeled image data to text space;

[0145] Calculate the class center of the labeled text;

[0146] According to the similarity between the projection data of the unlabeled image data and the class center of the text data, assign pseudo-labels to these image data, and use the text data corresponding to the class center closest to it as the corresponding text modality;

[0147] Use the labeled and assigned pseudo-labeled image data, as well as the corresponding text data as the training data set, and learn the pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-modal retrieval method and system based on pseudo label learning and semantic consistency, and the method comprises the steps: receiving an image data set and a text data set which comprise marked image text pairs and unmarked image data; Learning a projection matrix projected from the image space to the text space, and projecting the unmarked image data to the textspace; calculating the class center of the marked text; Distributing pseudo labels for the image data according to the similarity between the projection data of the unmarked image data and the classcenters of the text data, and taking the text data corresponding to the class center closest to the image data as a corresponding text mode; Taking the marked image data distributed with the pseudo tags and the corresponding text data as a training data set, and learning a projection matrix of images and texts projected to a public semantic space; Performing cross-modal retrieval. Unmarked data isintroduced into the training data set, and a more effective projection matrix can be obtained.

Description

technical field [0001] The disclosure belongs to the technical field of cross-modal retrieval, and in particular relates to a cross-modal retrieval method and system based on pseudo-label learning and semantic consistency. Background technique [0002] With the advancement of information technology, the amount of multimodal data is increasing. Multimodal data is ubiquitous, and humans use the Internet to share personal text, audio, image, and video information. Multimodal data refers to data that describes the same object / concept in different modalities. Different components of multimodal data for a specific object / concept exist in different modalities, but are related at a high semantic level. Multimodal data widely exists in daily life, so the detection and analysis of multimodal data is an important research field. Multimodal data retrieval methods are different from traditional data retrieval methods that mine information from the same modality data. In cross-modal r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/04G06N3/08
Inventor 徐功文王义华石林张志军赵莉李晓梅张娟吴永春胡顺泉
Owner SHANDONG JIANZHU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products