Image-text retrieval system and method based on multi-angle self-attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A retrieval system and attention technology, applied in the field of cross-modal retrieval, can solve problems such as insufficient features and achieve the effect of performance improvement

Pending Publication Date: 2019-07-09

FUDAN UNIV

View PDF5 Cites 33 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The present invention provides an image-text retrieval system based on multi-stage training and multi-angle self-attention mechanism in order to overcome the shortcomings of the features extracted by the existing CNN+RNN model in the image-text retrieval technology that are not detailed enough and the optimization method. method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038]It can be seen from the background technology that the instance features extracted by the existing image-text retrieval methods are relatively rough, which cannot reflect the key semantic information well, and there is room for improvement in the optimization method. The applicant conducts research on the above-mentioned problems and believes that the key information can be extracted from different angles. For example, given an image, different people may pay attention to different content, such as dogs or grass, and the same is true for text. To this end, the self-attention mechanism is used to extract the key information from different angles, and at the same time, further research is done on the optimization of difficult examples. It is found that the overall optimization and then the optimization of difficult examples can make the proposed framework more effective. Good optimization, learn better network parameters.

[0039] In this embodiment, image region features...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of cross-modal retrieval, and particularly relates to an image-text retrieval system and method based on a multi-angle self-attention mechanism. The systemcomprises a deep convolutional network, a bidirectional recurrent neural network, an image, a text self-attention network, a multi-modal space mapping network and a multi-stage training module. The deep convolutional network is used for acquiring an embedding vector of an image region feature in an image embedding space. The bidirectional recurrent neural network is used for acquiring an embedding vector of a word feature in a text space, and the two vectors are respectively input to the image and the text self-attention network. The image and text self-attention network is used for acquiringan embedded representation of an image key area and an embedded representation of key words in sentences. The multi-modal space mapping network is used for acquiring the embedded representation of the image text in the multi-modal space. The multi-stage training module is used for learning parameters in the network. A good result is obtained on a common data set Flickr30k and an MSCOCO, and the performance is greatly improved.

Description

technical field [0001] The invention belongs to the technical field of cross-modal retrieval, and in particular relates to an image-text retrieval system and method based on a multi-angle self-attention mechanism. Background technique [0002] In various multimodal information processing tasks, the research on cross-modal analysis and processing between images and texts is a very important one among many research directions. Specifically, it includes tasks such as automatic generation of image descriptions and mutual search of images and texts. Here we focus on cross-modal retrieval, that is, image-text mutual search tasks. Image-text mutual search is to input an image and need to find K sentences with the most similar semantics. Or enter a sentence and find the K images most semantically related to it. Image-text mutual search is a very challenging task, because it involves two very important branch research fields of pattern recognition, namely computer vision and natur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/53G06F16/535G06F16/33G06N3/04

CPCG06N3/045

Inventor 张玥杰李文杰张涛

Owner FUDAN UNIV

Image-text retrieval system and method based on multi-angle self-attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology