Image-text cross-modal retrieval method, system and device and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cross-modal, graphic-text technology, applied in still image data retrieval, unstructured text data retrieval, digital data information retrieval, etc. achieve the effect of improving accuracy

Inactive Publication Date: 2019-07-26

SOUTH CHINA NORMAL UNIVERSITY

View PDF7 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This research area faces two main problems: first, how to effectively select features of images and texts; second, how to maximize the correlation between images and texts

For example, given a picture, the text most relevant to the picture should be retrieved, but irrelevant texts should also be retrieved at the same time, because their distance in the common subspace is the shortest, which directly affects the correlation between the picture and the text, Reduced precision for cross-modal retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0056] Such as figure 1 As shown, this embodiment provides a cross-modal image-text retrieval method, including the following steps:

[0057] S1. Combining the training set and the combined loss function to establish a retrieval model.

[0058] S2. After obtaining the text / picture data to be retrieved, combine the text / picture data with the retrieval model trained using the combined loss function to obtain similarity information.

[0059] S3. Obtain corresponding image / text data according to the similarity information.

[0060] The training set includes a text training set and a picture training set. The training set is input into the deep neural network framework for training to obtain a model, and then the model is classified and trained through a combined loss function to capture the potential information between the picture and the text, and finally Generate a retrieval model. When it is necessary to retrieve pictures based on text, input the text into the retrieval mod...

specific Embodiment

[0089] Below, combine figure 2 and image 3 Explain in detail the steps of building a detection model.

[0090] First, assume that the set of pictures is represented by I={x 1 , x 2 ,...,x n ,} means that the text set is represented by T={y 1 ,y 2 ,...,y n ,}express. The paired picture text collection uses R={P 1 , P 2 ,...,P n ,} means that each pair of picture and text P i =(z i ,y i ) contains x i d of the picture I Dimensional image features and y i text d T Dimensional text features. Then use the Euclidean distance to define the similarity between the picture and the text, the formula is as follows:

[0091]

[0092] where f I (.) is the embedding layer function of the picture of the Euclidean distance, f T (.) is the text embedding layer function. D(.,.) is the distance metric in Euclidean space. D(f I (x i ), f T (y i )) The smaller the distance, it means that the picture x i and the text y i more similar. We employ a pair ranking model ...

Embodiment 2

[0121] Such as Figure 4 As shown, this embodiment provides a graphic-text cross-modal retrieval system, including:

[0122] The calculation module is used to obtain the similarity information by combining the text / picture data and the retrieval model trained by the combined loss function after obtaining the text / picture data to be retrieved;

[0123] The obtaining module is used to obtain corresponding picture / text data according to the similarity information.

[0124] A graphic-text cross-modal retrieval system in this embodiment can execute a graphic-text cross-modal retrieval method provided in Embodiment 1 of the method of the present invention, can execute any combination of implementation steps in the method embodiments, and has corresponding functions and beneficial effects.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image-text cross-modal retrieval method, system and device and a storage medium, and the method comprises the following steps: obtaining to-be-retrieved character / picture data, and obtaining similarity information by combining the character / picture data and a retrieval model trained by adopting a combined loss function; and obtaining corresponding picture / text data according to the similarity information. According to the invention, the to-be-retrieved character / picture data is input into the retrieval model to calculate the similarity matrix, corresponding picture / text data is obtained according to the similarity matrix; due to the fact that the retrieval model is trained through the combined loss function, the closer distance between the relevant pictures and the relevant texts can be kept, the retrieval model is far away from irrelevant data, the retrieval accuracy between the pictures and the relevant texts is greatly improved, and the method can be widely applied to the technical field of multimedia information retrieval.

Description

technical field [0001] The present invention relates to the technical field of multimedia information retrieval, in particular to a graphic-text cross-modal retrieval method, system, device and storage medium. Background technique [0002] Over the past decade, with the rapid development of the Internet, social media, and other information technologies, data in various forms has exploded. Often, different media use different types of data to describe the same object or subject. For example, a dog described in a text and a picture of a dog have the same semantic meaning of "dog". Of course, in cross-modal image and text retrieval, given text, the most relevant image should be queried, or given an image, the most relevant text list describing the image should be retrieved. This research area faces two main problems: first, how to effectively select features of images and texts; second, how to maximize the correlation between images and texts. [0003] To maximize the correl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/33G06F16/383G06F16/35G06F16/55G06F16/53G06F16/583

CPCG06F16/334G06F16/35G06F16/383G06F16/53G06F16/55G06F16/583

Inventor 肖菁简杨沃李晶晶朱佳曹阳

Owner SOUTH CHINA NORMAL UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image-text cross-modal retrieval method, system and device and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

specific Embodiment

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology