Method, device and medium for visual question answering

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A vision and answer technology, applied in the field of deep learning, can solve problems such as failure to achieve interaction, reduced accuracy of visual question and answer results, reduced feature representation ability and feature extraction speed, etc.

Active Publication Date: 2022-02-11

合肥名龙电子科技有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The feature extraction of the existing VQA model includes three modules, text feature extraction, image feature extraction and feature fusion module. For the text feature extraction module, the commonly used recurrent neural network (Recurrent Neural Network, RNN), long short-term memory network ( Long Short Term Memory networks, LSTM), gated recurrent unit (Gated Recurrent Unit, GRU), etc., commonly used for image feature extraction modules are convolutional neural networks (Convolutional Neural Networks, CNN), multi-layer perceptron (Multi-layer Perceptron, MLP), etc., the feature extraction accuracy of different feature extractors is also different, there are high and low, and the information from image features and text features does not achieve better interaction when performing feature fusion, which greatly reduces the feature representation ability and The running speed of feature extraction has led to a decrease in the accuracy of the results of visual question answering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] The following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0039] The core of the present invention is to provide a method, device and medium for visual question answering. Improve the running speed of feature extraction and improve the accuracy of answers to visual questions and answers.

[0040] In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying dra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual question answering method, device and medium, which are applied in the field of deep learning to obtain target text and target images, and convert the target text and target images into text data and image data respectively; the text data and image data are respectively Input to the Transformer-based model to extract text features and image features, and then input the text features and image features to the Transformer model to obtain fusion features, and finally input the fusion features to the classifier to obtain the answer to the visual question answering task. In terms of feature extraction, the Transformer model is completely used to speed up extraction, save computing costs, reduce the amount of operational parameters, aggregate text features and image features, and achieve better interaction between image features and text features, improving feature representation capabilities. , improve the running speed of feature extraction, and improve the answer accuracy of visual question answering.

Description

technical field [0001] The invention relates to the field of deep learning, in particular to a method, device and medium for visual question answering. Background technique [0002] With the rapid development of artificial intelligence, machine learning is usually given to output the results people want. However, in the field of deep learning, computer vision (Computer Vision, CV) and natural language processing (Natural language processing, NLP) have entered a bottleneck state, and multi-modal deep learning has gradually become a research hotspot. Combining CV and NLP The visual question answering (Visual Question Answering, VQA) of VQA is based on a picture and natural language questions about the picture as input, which is given to the machine to understand and fuse the image, and the information contained in the language modality will output the answer. [0003] The feature extraction of the existing VQA model includes three modules, text feature extraction, image featu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06K9/62G06V10/80G06V10/774

CPCG06F18/254G06F18/253G06F18/214

Inventor王润民徐尉翔朱桂林刘莹莹刘明昊朱祯琳朱姿諭丁亚军戴颖龙代建华

Owner合肥名龙电子科技有限公司

Method, device and medium for visual question answering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology