Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Visual dialogue generation method based on double visual attention network

An attention and dual-vision technology, applied in the field of computer vision, can solve the problems of only considering the global visual features, the visual semantic information is not accurate enough, and the word-level semantics are not considered.

Inactive Publication Date: 2020-01-03
HEFEI UNIV OF TECH
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] For example, in 2017, Jiasen Lu and other authors published an image based on historical dialogue in the article "Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model" published at the top international conference Conference and Workshop on Neural Information Processing Systems (NIPS 2017). Attention method, this method first performs sentence-level attention processing on historical dialogues, and then performs attention learning on image features based on the processed text features, but this method only considers sentence-level information when processing the text information of the current problem. Semantics, without considering the semantics at the word level, and in the actual question sentence, usually only some keywords are most relevant to the predicted answer
Therefore, this method has certain limitations in practical application.
[0006] 2. Existing methods are based on global image feature extraction, resulting in inaccurate visual semantic information
This article uses global visual features, questions, and historical dialogue text features to perform a series of mutual attention processing and fusion to obtain multi-modal semantic features. This method effectively learns the semantic relationship between different features, but this method only considers The global visual features are taken into account, so that after the attention processing of the image, some visual information irrelevant to the question is often paid attention to, and these redundant information will interfere with the answer prediction of the agent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual dialogue generation method based on double visual attention network
  • Visual dialogue generation method based on double visual attention network
  • Visual dialogue generation method based on double visual attention network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0091] In this example, if figure 1 As shown, a visual dialogue generation method based on a dual-visual attention network is performed as follows:

[0092] Step 1. Preprocessing of text input in visual dialogue and construction of word list:

[0093] Step 1.1. Obtain visual dialogue datasets from the Internet. The currently public datasets mainly include VisDialDataset, which is collected by relevant researchers from the Georgia Institute of Technology. The visual dialogue dataset contains sentence text and images;

[0094] Perform word segmentation processing on all sentence texts in the visual dialogue dataset to obtain segmented words;

[0095] Step 1.2, screen out all words whose word frequency is greater than the threshold from the word after segmentation, the size of the threshold can be set to 4, and build the word index table Voc; the method for creating the word index table Voc: the word table can contain words, punctuation marks ; Count the number of words and sor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a visual dialogue generation method based on a double visual attention network. The method comprises the following steps: 1, preprocessing text input in a visual dialogue and constructing a word list; 2, feature extraction of a dialogue image and feature extraction of a dialogue text; 3, attention processing is performed on the historical dialogue information based on the current problem information; 4, independent attention processing of the double visual features is carried out; 5, attention processing of mutual intersection of the double visual features; 6, optimizing the visual features; 7, performing multi-modal semantic fusion and decoding to generate an answer feature sequence; 8, optimizing parameters of a visual dialogue generation network model based on the double visual attention networks; 9, prediction answer generation. According to the invention, more complete and reasonable visual semantic information and finer-grained text semantic information can be provided for the intelligent agent, so that the reasonability and accuracy of answers predicted and generated by the intelligent agent to questions are improved.

Description

technical field [0001] The invention belongs to the technical field of computer vision and relates to technologies such as pattern recognition, natural language processing and artificial intelligence, and specifically relates to a method for generating visual dialogue based on a dual-vision attention network. Background technique [0002] Visual dialogue is a method of human-computer interaction, the purpose of which is to enable machine agents and humans to conduct reasonable and correct natural dialogues in the form of questions and answers on a given daily scene graph. Therefore, how to make the agent correctly understand the multi-modal semantic information composed of images and texts so as to give reasonable answers to the questions raised by humans is the key to visual dialogue. Visual dialogue is currently one of the hot research topics in the field of computer vision, and its application scenarios are also very extensive, including: helping visually impaired people ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/31G06F16/332G06F16/583
CPCG06F16/316G06F16/3329G06F16/583
Inventor 郭丹王辉汪萌
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products