Visual dialogue generation method based on double visual attention network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention and dual-vision technology, applied in the field of computer vision, can solve the problems of only considering the global visual features, the visual semantic information is not accurate enough, and the word-level semantics are not considered.

Inactive Publication Date: 2020-01-03

HEFEI UNIV OF TECH

View PDF5 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] For example, in 2017, Jiasen Lu and other authors published an image based on historical dialogue in the article "Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model" published at the top international conference Conference and Workshop on Neural Information Processing Systems (NIPS 2017). Attention method, this method first performs sentence-level attention processing on historical dialogues, and then performs attention learning on image features based on the processed text features, but this method only considers sentence-level information when processing the text information of the current problem. Semantics, without considering the semantics at the word level, and in the actual question sentence, usually only some keywords are most relevant to the predicted answer

Therefore, this method has certain limitations in practical application.

[0006] 2. Existing methods are based on global image feature extraction, resulting in inaccurate visual semantic information

This article uses global visual features, questions, and historical dialogue text features to perform a series of mutual attention processing and fusion to obtain multi-modal semantic features. This method effectively learns the semantic relationship between different features, but this method only considers The global visual features are taken into account, so that after the attention processing of the image, some visual information irrelevant to the question is often paid attention to, and these redundant information will interfere with the answer prediction of the agent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0091] In this example, if figure 1 As shown, a visual dialogue generation method based on a dual-visual attention network is performed as follows:

[0092] Step 1. Preprocessing of text input in visual dialogue and construction of word list:

[0093] Step 1.1. Obtain visual dialogue datasets from the Internet. The currently public datasets mainly include VisDialDataset, which is collected by relevant researchers from the Georgia Institute of Technology. The visual dialogue dataset contains sentence text and images;

[0094] Perform word segmentation processing on all sentence texts in the visual dialogue dataset to obtain segmented words;

[0095] Step 1.2, screen out all words whose word frequency is greater than the threshold from the word after segmentation, the size of the threshold can be set to 4, and build the word index table Voc; the method for creating the word index table Voc: the word table can contain words, punctuation marks ; Count the number of words and sor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual dialogue generation method based on a double visual attention network. The method comprises the following steps: 1, preprocessing text input in a visual dialogue and constructing a word list; 2, feature extraction of a dialogue image and feature extraction of a dialogue text; 3, attention processing is performed on the historical dialogue information based on the current problem information; 4, independent attention processing of the double visual features is carried out; 5, attention processing of mutual intersection of the double visual features; 6, optimizing the visual features; 7, performing multi-modal semantic fusion and decoding to generate an answer feature sequence; 8, optimizing parameters of a visual dialogue generation network model based on the double visual attention networks; 9, prediction answer generation. According to the invention, more complete and reasonable visual semantic information and finer-grained text semantic information can be provided for the intelligent agent, so that the reasonability and accuracy of answers predicted and generated by the intelligent agent to questions are improved.

Description

technical field [0001] The invention belongs to the technical field of computer vision and relates to technologies such as pattern recognition, natural language processing and artificial intelligence, and specifically relates to a method for generating visual dialogue based on a dual-vision attention network. Background technique [0002] Visual dialogue is a method of human-computer interaction, the purpose of which is to enable machine agents and humans to conduct reasonable and correct natural dialogues in the form of questions and answers on a given daily scene graph. Therefore, how to make the agent correctly understand the multi-modal semantic information composed of images and texts so as to give reasonable answers to the questions raised by humans is the key to visual dialogue. Visual dialogue is currently one of the hot research topics in the field of computer vision, and its application scenarios are also very extensive, including: helping visually impaired people ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/31G06F16/332G06F16/583

CPCG06F16/316G06F16/3329G06F16/583

Inventor 郭丹王辉汪萌

Owner HEFEI UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual dialogue generation method based on double visual attention network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology