Attention mechanism-based image target prediction method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of target prediction and attention, applied in neural learning methods, computer components, instruments, etc., to achieve the effect of improving efficiency and optimizing the visual backbone

Active Publication Date: 2021-02-02

南强智视(厦门)科技有限公司

View PDF6 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Another issue is the perception of instance-level semantic differences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0060] The technical solutions and beneficial effects of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0061] Such as figure 1 As shown, the present invention provides a kind of image target prediction method based on attention mechanism, comprises the following steps:

[0062] 1. Model implementation process:

[0063] 1.1 Input of the model:

[0064] Such as figure 2 As shown, the input of the model is an RGB image with a size of 320×320×3, and a description language for an object in the picture, and the longest text input of the model is set to 15.

[0065] 1.2 Visual Feature Encoder:

[0066] For the input RGB image, we use the VOC target detection dataset (see Mark Everingham, Luc Van Gool, Christopher K IWilliams, John Winn, and Andrew Zisserman. The pascalvisual object classes (voc) challenge. In IJCV, 2010.) The pre-trained neural network DeepLab-ResNet101 (see Liangchieh Chen, George Papandreou, Iasonas K...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an attention mechanism-based image target prediction method. The method is used for obtaining a mask of an object aimed at by a description language in an RGB image. The methodcomprises the following steps: extracting visual features of three scales of an RGB image; extracting language features of the description language; performing multi-modal fusion on the visual features and the language features; calculating four mapping matrixes based on the multi-modal features and the language features; acquiring two attention graphs through matrix operation and an activation function; finally, adding the obtained attention matrix and the original matrix to update the original matrix: superposing grouped attention modules, and then obtaining a predicted mask through a 1 * 1convolutional neural network. According to the method, a supervised attention mechanism is introduced, so that the reasoning capability in a complex scene can be enhanced, and the detection precisionis improved.

Description

technical field [0001] The invention belongs to the technical field of image target detection, relates to a directional visual segmentation method, in particular to a modeling method of a multi-step reasoning-based cascade grouping attention mechanism. Background technique [0002] Directional visual segmentation is a multimodal task based on vision and language. Such as figure 1 As shown, given a description about an object in an image, directional vision segmentation needs to calculate the mask of the corresponding object in the image. The advantage of directional vision segmentation is that it is not limited to a fixed number of object categories, and can achieve fast language-to-vision alignment, which can be widely used in various scenarios, such as interactive image editing and human-computer interaction. [0003] Most of the existing methods mainly focus on the traditional multimodal fusion problem, and common directional visual segmentation frameworks use convoluti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/34G06K9/62G06N3/04G06N3/08

CPCG06N3/084G06N3/049G06V10/267G06V2201/07G06N3/045G06F18/253G06F18/214

Inventor 许金泉王振宁王溢蔡碧颖

Owner 南强智视(厦门)科技有限公司

Attention mechanism-based image target prediction method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology