Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image-text cross-modal retrieval method, system and device and storage medium

A cross-modal, graphic-text technology, applied in still image data retrieval, unstructured text data retrieval, digital data information retrieval, etc. achieve the effect of improving accuracy

Inactive Publication Date: 2019-07-26
SOUTH CHINA NORMAL UNIVERSITY
View PDF7 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This research area faces two main problems: first, how to effectively select features of images and texts; second, how to maximize the correlation between images and texts
For example, given a picture, the text most relevant to the picture should be retrieved, but irrelevant texts should also be retrieved at the same time, because their distance in the common subspace is the shortest, which directly affects the correlation between the picture and the text, Reduced precision for cross-modal retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image-text cross-modal retrieval method, system and device and storage medium
  • Image-text cross-modal retrieval method, system and device and storage medium
  • Image-text cross-modal retrieval method, system and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] Such as figure 1 As shown, this embodiment provides a cross-modal image-text retrieval method, including the following steps:

[0057] S1. Combining the training set and the combined loss function to establish a retrieval model.

[0058] S2. After obtaining the text / picture data to be retrieved, combine the text / picture data with the retrieval model trained using the combined loss function to obtain similarity information.

[0059] S3. Obtain corresponding image / text data according to the similarity information.

[0060] The training set includes a text training set and a picture training set. The training set is input into the deep neural network framework for training to obtain a model, and then the model is classified and trained through a combined loss function to capture the potential information between the picture and the text, and finally Generate a retrieval model. When it is necessary to retrieve pictures based on text, input the text into the retrieval mod...

specific Embodiment

[0089] Below, combine figure 2 and image 3 Explain in detail the steps of building a detection model.

[0090] First, assume that the set of pictures is represented by I={x 1 , x 2 ,...,x n ,} means that the text set is represented by T={y 1 ,y 2 ,...,y n ,}express. The paired picture text collection uses R={P 1 , P 2 ,...,P n ,} means that each pair of picture and text P i =(z i ,y i ) contains x i d of the picture I Dimensional image features and y i text d T Dimensional text features. Then use the Euclidean distance to define the similarity between the picture and the text, the formula is as follows:

[0091]

[0092] where f I (.) is the embedding layer function of the picture of the Euclidean distance, f T (.) is the text embedding layer function. D(.,.) is the distance metric in Euclidean space. D(f I (x i ), f T (y i )) The smaller the distance, it means that the picture x i and the text y i more similar. We employ a pair ranking model ...

Embodiment 2

[0121] Such as Figure 4 As shown, this embodiment provides a graphic-text cross-modal retrieval system, including:

[0122] The calculation module is used to obtain the similarity information by combining the text / picture data and the retrieval model trained by the combined loss function after obtaining the text / picture data to be retrieved;

[0123] The obtaining module is used to obtain corresponding picture / text data according to the similarity information.

[0124] A graphic-text cross-modal retrieval system in this embodiment can execute a graphic-text cross-modal retrieval method provided in Embodiment 1 of the method of the present invention, can execute any combination of implementation steps in the method embodiments, and has corresponding functions and beneficial effects.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an image-text cross-modal retrieval method, system and device and a storage medium, and the method comprises the following steps: obtaining to-be-retrieved character / picture data, and obtaining similarity information by combining the character / picture data and a retrieval model trained by adopting a combined loss function; and obtaining corresponding picture / text data according to the similarity information. According to the invention, the to-be-retrieved character / picture data is input into the retrieval model to calculate the similarity matrix, corresponding picture / text data is obtained according to the similarity matrix; due to the fact that the retrieval model is trained through the combined loss function, the closer distance between the relevant pictures and the relevant texts can be kept, the retrieval model is far away from irrelevant data, the retrieval accuracy between the pictures and the relevant texts is greatly improved, and the method can be widely applied to the technical field of multimedia information retrieval.

Description

technical field [0001] The present invention relates to the technical field of multimedia information retrieval, in particular to a graphic-text cross-modal retrieval method, system, device and storage medium. Background technique [0002] Over the past decade, with the rapid development of the Internet, social media, and other information technologies, data in various forms has exploded. Often, different media use different types of data to describe the same object or subject. For example, a dog described in a text and a picture of a dog have the same semantic meaning of "dog". Of course, in cross-modal image and text retrieval, given text, the most relevant image should be queried, or given an image, the most relevant text list describing the image should be retrieved. This research area faces two main problems: first, how to effectively select features of images and texts; second, how to maximize the correlation between images and texts. [0003] To maximize the correl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/383G06F16/35G06F16/55G06F16/53G06F16/583
CPCG06F16/334G06F16/35G06F16/383G06F16/53G06F16/55G06F16/583
Inventor 肖菁简杨沃李晶晶朱佳曹阳
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products