Visual question and answer method based on module routing network model

A network model and routing technology, applied in biological neural network models, neural learning methods, character and pattern recognition, etc.

Pending Publication Date: 2022-03-04
FUDAN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The present invention is carried out to solve the above-mentioned problems, based on the module routing network model, it provides a visual question answering method based on the module routing network that can fuse two modalities of vision and text at multiple semantic levels, and can reason complex problems. Specifically, the following technical solutions are adopted:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question and answer method based on module routing network model
  • Visual question and answer method based on module routing network model
  • Visual question and answer method based on module routing network model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] Embodiment one function and effect

[0073] A visual question answering method based on the module routing network proposed by the present invention is to process the text of the natural language question through the text network to obtain the question features, and then process the picture and the routing path by the visual network to obtain the corresponding image features, and then the routing network corresponds to the The features in the text network and visual network are processed through the routing path generated based on the characteristics of the question, so as to obtain the final feature and extract the final answer from the predetermined answer of the answerer through the predetermined training model. Therefore, a visual question answering method based on a modular routing network provided by the present invention can fuse the two modalities of text and vision in multiple levels, so that the features of text and pictures can be fully analyzed and processed,...

Embodiment 2

[0092] Embodiment two function and effect

[0093] In the embodiment, the image features and question features of each module layer are processed by the routing network to obtain the routing path of the next layer, which can more accurately combine the image features and question features to obtain the final features that are more practical. The answer obtained by inputting the final features into the answerer will also be more effective in solving the problem.

[0094] The routing network aggregates the features processed by the activation module and the residual input as the image features output by each module layer. The image features of each layer can consider the residual input on the basis of the image features of the previous module layer, and can combine the effective content that may be lost in the previous step with the processed image features to avoid inconsistencies. Photo and answer to question text.

Embodiment 3

[0096] In the third embodiment, on the basis of the first embodiment, an attention module is added to the visual network, which is used to paste the image features and the problem features, and establish the pairwise positions or objects in the image features through the spatial self-attention mechanism. Contact to get the final feature.

[0097] The attention modules in the visual network include a first attention module and a second attention module.

[0098] The first attention module is used to paste image features and problem features, and then use the spatial self-attention mechanism to model the connection between two positions or objects in the feature map, and obtain the final feature with stronger expressive ability, spatial self-attention The force mechanism can be implemented in different ways, such as the encoder in Transformer.

[0099] The second attention module is used to weight and average the image features by the spatial attention mechanism to obtain a nor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a visual question answering method based on a module routing network, which is used for processing a natural language question text and an input question picture according to a module routing network model and generating a question answer, and is characterized in that the module routing network model is provided with a text network, a routing network and a visual network, comprising the following steps: step 1, inputting a natural language question text into a text network to extract question features; 2, activating a corresponding module in the visual network to become an activation module according to a routing path generated by the routing path at least based on the problem features, inputting the problem picture into the visual network, and extracting image features from the problem picture by the activation module to form corresponding final features; and step 3, inputting the final features into the responder to generate answers to the questions. According to the method, the text mode and the vision mode are fused in multiple levels, expert knowledge and supervision information are not needed when complex questions are answered, and the method can be widely applied to the situation that multiple modes need to be combined.

Description

technical field [0001] The invention relates to a visual question answering method based on a module routing network model, belongs to the field of artificial intelligence, and is used for solving visual question answering tasks. Background technique [0002] Historically, computer vision and natural language processing have been developed as separate research directions. With the revival of neural networks, these two fields have been giving birth to new research tasks, and among them, some tasks connecting the two fields have been proposed, and the present invention relates to one of the visual question answering tasks [1 ]. [0003] Visual question answering, that is, given an image and a question pair, the model needs to answer the question based on the content of the image. Compared with classic computer vision tasks such as image recognition, detection and segmentation, a visual question answering method based on a module routing network model provided by the present ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/35G06F16/583G06V10/774G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06F16/3329G06F16/353G06F16/583G06N3/08G06N3/048G06N3/045G06F18/214
Inventor 吴彦泽薛向阳李斌
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products