Unlock instant, AI-driven research and patent intelligence for your innovation.

Cross-modal image-text mutual indexing method based on self-attention reasoning

An attentional, cross-modal technology applied at the intersection of vision and language to achieve improved accuracy and stability

Pending Publication Date: 2022-05-10
CENT SOUTH UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But it also increases the challenge, how to achieve faster image-text retrieval from massive data with interference and noise, improve the ability to extract and measure image and text features, so that the model can have a good matching effect, all images and texts Key issues that search will face

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal image-text mutual indexing method based on self-attention reasoning
  • Cross-modal image-text mutual indexing method based on self-attention reasoning
  • Cross-modal image-text mutual indexing method based on self-attention reasoning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described in conjunction with the accompanying drawings and specific embodiments. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined in the present application.

[0029] See attached figure 1 , 2 , a self-attention reasoning-based cross-modal graphic-text mutual search method designed by the present invention, the specific implementation process is as follows:

[0030] Step 1: Get the data set, get the paired original image data and text annotation data, and divide it into training data set, verification data set and test data set.

[0031] The multimodal datasets used fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-modal image-text mutual search method based on self-attention reasoning, and belongs to the field of cross-modal search. The self-attention reasoning model provided by the invention mainly comprises three modules: in the first part, image saliency features are extracted by using a pre-training backbone network from top to bottom, and text branch feature extraction is obtained by using a word embedding and serialization model class structure; in the second part, a self-attention reasoning module is designed, contribution of each bounding box to overall semantics and adhesiveness between semantics are considered, and negative effects caused by irrelevant semantics are further eliminated; and in the third part, an interactive attention module between the two branches is designed, so that a corresponding image-text pair has a larger weight to influence subsequent similarity evaluation. Experiments prove that compared with a traditional method, the method has the characteristics of higher matching precision and higher retrieval speed.

Description

technical field [0001] The invention belongs to the intersection field of vision and language, and is applied to cross-modal retrieval tasks between images and texts. More specifically, it relates to a cross-modal image-text mutual retrieval method based on self-attention reasoning. Background technique [0002] With the rapid development of network technology, especially the application of emerging social platforms and mobile devices, the Internet world is flooded with a large amount of multimodal information (text, image, video, audio, etc.). It is precisely because of this that new changes have taken place in the user's demand for the search function of human-computer interest interaction. Many platforms are no longer limited to matching between single modalities but have realized cross-modal matching functions. When users search for information by submitting queries of any modality, they can obtain various forms of search results, which can provide more comprehensive sup...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/43G06F16/432G06N3/04G06N3/08
CPCG06F16/434G06F16/43G06N3/08G06N3/045
Inventor 李召
Owner CENT SOUTH UNIV