Indication expression understanding method based on multi-level expression attention-guiding network

A multi-level, attentional technology, applied in the field of indication expression understanding, can solve the problems of indistinguishable target area from other areas, indistinguishable objects that cannot be similar, etc.
CN112488111AActive Publication Date: 2021-03-12GUIZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
GUIZHOU UNIV
Publication Date
2021-03-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an indication expression understanding method based on a multi-level expression attention-guiding network, and innovatively designs a new multi-level attention mechanism, namely a multi-level expression attention-guiding network (MEGA-Net), which comprises a three-level attention network. The multi-level attention mechanism can generate image region representations with distinction degrees under the guidance of expression representations of different levels (statement levels, word levels and phrase levels), thereby helping to accurately determine a target region. In addition, an existing method generally adopts a single-stage mode to match regions, and the mode cannot well distinguish similar objects or targets. Aiming at the problem, the invention designs a two-stage structure to compare the similar image areas and find out the difference between the similar image areas, so as to match the optimal image area. According to the method, evaluation is carried out on three popular data sets, and experimental results show that the performance of the method is superior to that of other highest-level models.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention belongs to the technical field of Referring Expression Comprehension (REC), and more specifically, relates to a referring expression comprehension method based on a multi-level expression guiding attention network. Background technique

[0002] The main task of Referring Expression Comprehension (REC) is to identify relevant targets or regions in a given image based on natural language expressions. A typical approach to this task is to first use a recurrent neural network model (RNN) to process expression sentences to obtain a representation of the text, and then use a convolutional neural network (CNN) to extract representations of image regions; after that, the two representations are mapped to A common semantic space is used to determine the best matching image regions.

[0003] Some existing methods apply self-attention mechanism to implicitly partition expression sentences into different phrase representations (subject, pred...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More