Multi-step self-attention cross-media retrieval method and system based on limited text space

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
An attention and cross-media technology, applied in the field of computer vision and information retrieval, can solve the problems of image and text encoding uncertainty, no consideration of object interaction information, image and text focus information cannot be fixed, etc., to reduce interference, The effect of fast training speed and good experimental results

Active Publication Date: 2019-05-21

PEKING UNIV SHENZHEN GRADUATE SCHOOL

View PDF6 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, most of the existing attention-based methods only consider the shared information at the object level between images and text, and do not consider the interactive information between objects.

[0004]The second sub-problem is how to find a suitable isomorphic feature space

If the additive or product self-attention mechanism is used in the cross-media retrieval algorithm, the focus information of images and texts cannot be fixed, resulting in the uncertainty of image and text encoding, which affects the practical application value of the algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0050] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0051] The invention provides a multi-step self-attention cross-media retrieval method based on a limited text space, which includes a feature extraction network, a feature mapping network and a similarity measurement network. The feature extraction network is used to extract global features, regional feature sets, and associated features of images and texts; secondly, the features are further sent to the feature mapping network, and as many objects as possible between images and texts are extracted through a multi-step self-attention mechanism level of shared information. However, it does not consider the interaction information between different objects. like figure 1 As shown, for two different image-text pairs, the object-level shared information between images and texts is similar, such as “m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-step self-attention cross-media retrieval method and retrieval system based on a restricted text space, and the method comprises the steps: constructing the restrictedtext space of a relatively fixed vocabulary, and converting an unrestricted text space into the restricted text space; Extracting image features and text features of the limited text space through a feature extraction network; Wherein the features comprise global features, a regional feature set and associated features; Sending the extracted features into a feature mapping network, and extractingobject-level sharing information between the image and the text through a multi-step self-attention mechanism; Collecting useful information at each moment through a similarity measurement network tomeasure the similarity between the image and the text, and calculating a triple loss function. Therefore, multi-step self-attention cross-media retrieval based on the limited text space is realized. The cross-media retrieval recall rate is greatly improved by introducing a multi-step self-attention mechanism and associated characteristics.

Description

technical field [0001] The invention relates to the technical field of computer vision and information retrieval, in particular to a multi-step self-attention cross-media retrieval method and system based on a limited text space. Background technique [0002] In recent years, with the rapid development of information technology, multimedia data on the Internet has become more and more abundant, and multimedia data of different modalities (text, image, audio, video, etc.) can be used to express similar content. In order to meet the growing needs of users for multimedia retrieval, people propose a cross-media retrieval task to find an isomorphic semantic space (public space, text space, image space) to make the similarity between the underlying heterogeneous multimedia data can be directly measured. More precisely, the core problem of this cross-media retrieval task can be subdivided into two sub-problems. [0003] The first sub-problem is how to learn effective low-level fe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F16/435G06N3/04G06N3/08

CPCG06F16/435G06N3/04G06N3/08

Inventor王文敏余政

OwnerPEKING UNIV SHENZHEN GRADUATE SCHOOL

Multi-step self-attention cross-media retrieval method and system based on limited text space

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology