Indication expression understanding method based on multi-level expression attention-guiding network

A multi-level, attentional technology, applied in the field of indication expression understanding, can solve the problems of indistinguishable target area from other areas, indistinguishable objects that cannot be similar, etc.

Active Publication Date: 2021-03-12
GUIZHOU UNIV
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Existing methods usually use a self-attention mechanism to focus on important words or phrases in the expression, which may lead to the inability to distinguish the target region from oth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Indication expression understanding method based on multi-level expression attention-guiding network
  • Indication expression understanding method based on multi-level expression attention-guiding network
  • Indication expression understanding method based on multi-level expression attention-guiding network

Examples

Experimental program
Comparison scheme
Effect test

example

[0160] The present invention is tested on three large-scale benchmark data sets RefCOCO, RefCOCO+ and RefCOCOg. From the experimental results, it can be seen that the present invention is superior to the highest level method, as shown in Table 1.

[0161]

[0162] Table 1

[0163] It can be concluded from Table 1 that the present invention has achieved the best performance in most of the subtasks, achieving accuracy rates of 87.45% and 86.93% respectively on the testA and testB test sets in the RefCOCO data; in the RefCOCO+ data The accuracy rates of 77.05% and 69.65% were respectively achieved on the testA and testB test sets in the RefCOCOg data; the accuracy rates of 80.29% were respectively achieved on the test sets in the RefCOCOg data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an indication expression understanding method based on a multi-level expression attention-guiding network, and innovatively designs a new multi-level attention mechanism, namely a multi-level expression attention-guiding network (MEGA-Net), which comprises a three-level attention network. The multi-level attention mechanism can generate image region representations with distinction degrees under the guidance of expression representations of different levels (statement levels, word levels and phrase levels), thereby helping to accurately determine a target region. In addition, an existing method generally adopts a single-stage mode to match regions, and the mode cannot well distinguish similar objects or targets. Aiming at the problem, the invention designs a two-stage structure to compare the similar image areas and find out the difference between the similar image areas, so as to match the optimal image area. According to the method, evaluation is carried out on three popular data sets, and experimental results show that the performance of the method is superior to that of other highest-level models.

Description

technical field [0001] The present invention belongs to the technical field of Referring Expression Comprehension (REC), and more specifically, relates to a referring expression comprehension method based on a multi-level expression guiding attention network. Background technique [0002] The main task of Referring Expression Comprehension (REC) is to identify relevant targets or regions in a given image based on natural language expressions. A typical approach to this task is to first use a recurrent neural network model (RNN) to process expression sentences to obtain a representation of the text, and then use a convolutional neural network (CNN) to extract representations of image regions; after that, the two representations are mapped to A common semantic space is used to determine the best matching image regions. [0003] Some existing methods apply self-attention mechanism to implicitly partition expression sentences into different phrase representations (subject, pred...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/32G06K9/46G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V10/25G06V10/44G06N3/045G06F18/22G06F18/2415
Inventor 杨阳彭亮
Owner GUIZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products