Text visual question-answering system and method based on concept interaction and associated semantics

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A textual and visual technology, applied in the field of visual question answering, which can solve problems such as ignoring objects and visual relationships

Active Publication Date: 2020-10-30

GUIZHOU UNIV +1

View PDF6 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In view of the above shortcomings in the prior art, the present invention provides a text visual question answering system and method based on concept interaction and associated semantics, which solves the problem of ignoring objects and visual relationships in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0066] Such as figure 1 As shown, the present invention provides a text visual question answering system based on concept interaction and associated semantics, including an object position extraction module, a first fully connected layer connected to the object position extraction module, a text information extraction module, and a text information extraction module The connected second fully connected layer, the OCR-object graph convolutional network connected to the first fully connected layer and the second fully connected layer, the gate step mechanism graph convolutional network and the AND gate connected to the OCR-object graph convolutional network Step mechanism graph convolutional network connected to the converter network, the converter network is connected to the bidirectional converter characterization encoder BERT; the object position extraction module is used to extract the vision in the image using the pre-trained fast area object detector Faster-RCNN model Featu...

Embodiment 2

[0069] Based on the above system, the present invention also provides a text visual question answering method based on concept interaction and associated semantics. The basic idea is to use the positional relationship between the object in the image and the text information to model the relationship, and then use the OCR-object image volume Integral network is used to model text information and object information, learn richer and more directional features through the coding of relations based on the gate mechanism, and then use the converter network according to the problem information to compare the objects and objects in the image. Accurate attention to the text to get more accurate answers. Such as Figure 2-Figure 3 As shown, its implementation method is as follows:

[0070] S1. Use the pre-trained fast regional object detector Faster-RCNN model to extract visual features and their corresponding location information in the image, and use the first fully connected layer to fu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a text visual question-answering system and method based on concept interaction and associated semantics. The system comprises an object position extraction module, a first fullconnection layer, a text information extraction module, a second full connection layer, an OCR-object graph convolutional network, a multi-gate-step mechanism graph convolutional network, a converternetwork and a bidirectional converter representation encoder BERT. According to the invention, modeling is carried out by using a position relationship between an object and text information in an image, then modeling is performed on text information and object information through the OCR-object graph convolutional network, thus learning abundant and directional features for relationship coding through a gate mechanism, and finally, precisely paying attention to objects and texts in an image through a converter network, thereby obtaining a more accurate answer.

Description

Technical field [0001] The invention belongs to the technical field of visual question answering, and in particular relates to a text visual question answering system and method based on concept interaction and associated semantics. Background technique [0002] With the development of visual and language interaction, text visual question answering algorithms have made great progress in recent years. As a branch of visual question answering, the text visual question answering algorithm focuses on how to mine the relationship between text and objects in pictures and use it to support question answering. It is also widely used in real applications, such as visual assistants for people with disabilities. Educational assistants for young children, etc. Compared with the traditional visual question answering algorithm, the goal of the text visual question answering algorithm is to require the model to understand the visual information and text information in the image at the same tim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/332G06F16/583G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06F16/3329G06F16/5846G06N3/08G06V30/40G06V30/10G06N3/045G06F18/25

Inventor高联丽李向鹏宋井宽

OwnerGUIZHOU UNIV

Text visual question-answering system and method based on concept interaction and associated semantics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology