Unlock instant, AI-driven research and patent intelligence for your innovation.

Attention mechanism-based image target prediction method

A technology of target prediction and attention, applied in neural learning methods, computer components, instruments, etc., to achieve the effect of improving efficiency and optimizing the visual backbone

Active Publication Date: 2021-02-02
南强智视(厦门)科技有限公司
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Another issue is the perception of instance-level semantic differences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Attention mechanism-based image target prediction method
  • Attention mechanism-based image target prediction method
  • Attention mechanism-based image target prediction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The technical solutions and beneficial effects of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0061] Such as figure 1 As shown, the present invention provides a kind of image target prediction method based on attention mechanism, comprises the following steps:

[0062] 1. Model implementation process:

[0063] 1.1 Input of the model:

[0064] Such as figure 2 As shown, the input of the model is an RGB image with a size of 320×320×3, and a description language for an object in the picture, and the longest text input of the model is set to 15.

[0065] 1.2 Visual Feature Encoder:

[0066] For the input RGB image, we use the VOC target detection dataset (see Mark Everingham, Luc Van Gool, Christopher K IWilliams, John Winn, and Andrew Zisserman. The pascalvisual object classes (voc) challenge. In IJCV, 2010.) The pre-trained neural network DeepLab-ResNet101 (see Liangchieh Chen, George Papandreou, Iasonas K...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an attention mechanism-based image target prediction method. The method is used for obtaining a mask of an object aimed at by a description language in an RGB image. The methodcomprises the following steps: extracting visual features of three scales of an RGB image; extracting language features of the description language; performing multi-modal fusion on the visual features and the language features; calculating four mapping matrixes based on the multi-modal features and the language features; acquiring two attention graphs through matrix operation and an activation function; finally, adding the obtained attention matrix and the original matrix to update the original matrix: superposing grouped attention modules, and then obtaining a predicted mask through a 1 * 1convolutional neural network. According to the method, a supervised attention mechanism is introduced, so that the reasoning capability in a complex scene can be enhanced, and the detection precisionis improved.

Description

technical field [0001] The invention belongs to the technical field of image target detection, relates to a directional visual segmentation method, in particular to a modeling method of a multi-step reasoning-based cascade grouping attention mechanism. Background technique [0002] Directional visual segmentation is a multimodal task based on vision and language. Such as figure 1 As shown, given a description about an object in an image, directional vision segmentation needs to calculate the mask of the corresponding object in the image. The advantage of directional vision segmentation is that it is not limited to a fixed number of object categories, and can achieve fast language-to-vision alignment, which can be widely used in various scenarios, such as interactive image editing and human-computer interaction. [0003] Most of the existing methods mainly focus on the traditional multimodal fusion problem, and common directional visual segmentation frameworks use convoluti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/34G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06N3/049G06V10/267G06V2201/07G06N3/045G06F18/253G06F18/214
Inventor 许金泉王振宁王溢蔡碧颖
Owner 南强智视(厦门)科技有限公司