Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Natural language visual reasoning method based on attention mechanism

A natural language and reasoning method technology, applied in the field of natural language visual reasoning based on the attention mechanism, can solve the problems of insufficient and inaccurate understanding of contextual information, and achieve the effects of small limitations, accurate understanding, and good comprehension

Pending Publication Date: 2022-03-25
NORTHWESTERN POLYTECHNICAL UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the problem of insufficient or inaccurate understanding of the context information of referring expressions in the prior art, the present invention provides a natural language visual reasoning method based on an attention mechanism

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language visual reasoning method based on attention mechanism
  • Natural language visual reasoning method based on attention mechanism
  • Natural language visual reasoning method based on attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be further described below in conjunction with the accompanying drawings and embodiments, and the present invention includes but not limited to the following embodiments.

[0057] Such as figure 1 As shown, the present invention provides a natural language visual reasoning method based on the attention mechanism, which mainly includes a language attention network module and three visual processing modules, and its specific implementation process is as follows:

[0058] 1. Language attention network module

[0059] (1) Use one-hot encoding to encode each word in the input language expression into the embedded representation vector et, and then use BiLSTM to encode the context of each word, and connect the obtained hidden vectors in the front and rear directions, Get the hidden representation vector h for each word t , t represents the word sequence number in the expression, t=1, 2, ..., T, T represents the number of words that the expression ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a natural language visual reasoning method based on an attention mechanism. The method mainly comprises a language analysis processing module and three visual processing modules, and comprises the following steps of: firstly, inputting a language expression, processing by utilizing one-hot coding, BiLSTM coding and the like, and calculating phrase embedding expressions and weights for the three visual processing modules according to the language expression; then, a Mask R-CNN detector is used for carrying out target detection on the input image, detection results are respectively input into a subject module, a position module and a relation module, and each module respectively calculates a matching score of the subject module, the position module and the relation module; and finally, calculating the weighted sum of the matching scores of the three modules as an overall matching score, taking the candidate object with the highest overall matching score as an object described by a language expression, and outputting a position frame of the object to complete visual reasoning of the image. The method has better contextual information understanding capability, and expressions of various structures can be processed.

Description

technical field [0001] The invention belongs to the technical fields of computer vision and natural language processing, and in particular relates to an attention mechanism-based natural language visual reasoning method. Background technique [0002] Referential expression understanding refers to locating object regions described by natural language in images. That is: input a picture (including people or other objects), enter a natural language description (referential expression) that can identify a specific object in the picture, the description is an English word, phrase or sentence, and can include the category of the object , position, color, size, and relationship to surrounding objects. It is required to locate the region of the described object in the picture (frame the object with a bounding box and segment it). Referential expression comprehension is a meaningful task that can be applied to image retrieval, such as finding objects with specific attributes in an ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06N3/04G06N3/08G06N5/04
CPCG06F40/30G06N3/08G06N5/041G06N3/044G06N3/045Y02D10/00
Inventor 王琦许杰袁媛
Owner NORTHWESTERN POLYTECHNICAL UNIV
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More