Cross-modal image-text mutual indexing method based on self-attention reasoning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attentional, cross-modal technology applied at the intersection of vision and language to achieve improved accuracy and stability

Pending Publication Date: 2022-05-10

CENT SOUTH UNIV

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But it also increases the challenge, how to achieve faster image-text retrieval from massive data with interference and noise, improve the ability to extract and measure image and text features, so that the model can have a good matching effect, all images and texts Key issues that search will face

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] The present invention will be further described in conjunction with the accompanying drawings and specific embodiments. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined in the present application.

[0029] See attached figure 1 , 2 , a self-attention reasoning-based cross-modal graphic-text mutual search method designed by the present invention, the specific implementation process is as follows:

[0030] Step 1: Get the data set, get the paired original image data and text annotation data, and divide it into training data set, verification data set and test data set.

[0031] The multimodal datasets used fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal image-text mutual search method based on self-attention reasoning, and belongs to the field of cross-modal search. The self-attention reasoning model provided by the invention mainly comprises three modules: in the first part, image saliency features are extracted by using a pre-training backbone network from top to bottom, and text branch feature extraction is obtained by using a word embedding and serialization model class structure; in the second part, a self-attention reasoning module is designed, contribution of each bounding box to overall semantics and adhesiveness between semantics are considered, and negative effects caused by irrelevant semantics are further eliminated; and in the third part, an interactive attention module between the two branches is designed, so that a corresponding image-text pair has a larger weight to influence subsequent similarity evaluation. Experiments prove that compared with a traditional method, the method has the characteristics of higher matching precision and higher retrieval speed.

Description

technical field [0001] The invention belongs to the intersection field of vision and language, and is applied to cross-modal retrieval tasks between images and texts. More specifically, it relates to a cross-modal image-text mutual retrieval method based on self-attention reasoning. Background technique [0002] With the rapid development of network technology, especially the application of emerging social platforms and mobile devices, the Internet world is flooded with a large amount of multimodal information (text, image, video, audio, etc.). It is precisely because of this that new changes have taken place in the user's demand for the search function of human-computer interest interaction. Many platforms are no longer limited to matching between single modalities but have realized cross-modal matching functions. When users search for information by submitting queries of any modality, they can obtain various forms of search results, which can provide more comprehensive sup...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/43G06F16/432G06N3/04G06N3/08

CPCG06F16/434G06F16/43G06N3/08G06N3/045

Inventor 李召

Owner CENT SOUTH UNIV

Cross-modal image-text mutual indexing method based on self-attention reasoning

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. An attentional, cross-modal technology applied at the intersection of vision and language to achieve improved accuracy and stability

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attentional, cross-modal technology applied at the intersection of vision and language to achieve improved accuracy and stability

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology