Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Visual dialogue generation system based on semantic alignment

A technology for generating systems and semantics, applied in biological neural network models, natural language data processing, instruments, etc., can solve the problems of not considering the quality of visual dialogue text, interference, and ignoring effects

Pending Publication Date: 2020-11-20
HEFEI UNIV OF TECH
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This processing ignores the gap in the representation of different modal information. If the image features and semantic information are not well aligned, can we really obtain sufficient information based on the extracted image features to generate reply, still in doubt
[0008] 2. Too much reliance on conversation history instead of image information to generate responses
However, although many models currently try to obtain more and more targeted information from images, they ignore whether the improvement of the effect is caused by the interference caused by adding too much historical information.
[0009] 3. Not considering the textual quality of generative visual dialogue
[0010] From the above analysis, we can see that the traditional visual dialogue generation system needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual dialogue generation system based on semantic alignment
  • Visual dialogue generation system based on semantic alignment
  • Visual dialogue generation system based on semantic alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0077] The task of visual dialogue generation is defined as follows: Given an image I, an image description C and a dialogue history H for t-1 rounds t = {C, (Q 1 , A 1 ),..., (Q t-1 , A t-1 )}, and the information of the current round of question Q, to generate the answer A for the current round of question Q.

[0078] The embodiment of the present invention finds that the problems of the traditional visual dialogue generation system include at least: the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a visual dialogue generation system based on semantic alignment. According to the invention, the image information is extracted from two aspects, i.e., a global image information and a local image information. Global image representation based on semantics is obtained through semantic alignment, meanwhile, local dense image description is obtained through dense caption, and high-level semantics of text representation is beneficial to better information acquisition. The two jointly provide clues of image information for generating replies. Meanwhile, comprehensive constraint is carried out from the aspects of text fluency, text coherence and correctness, and generation of replies is guided. In addition, the embodiment of the invention provides a keyword constraint method to constrain the correctness of the reply, so as to enrich the representation form of the generated reply.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of language processing, and in particular to a visual dialogue generation system based on semantic alignment. Background technique [0002] In recent years, with the rapid development of artificial intelligence and robotics, the multimodal semantic understanding of vision and language has received more and more attention and attention in the fields of computer vision and natural language processing. Human-computer interaction cannot only consider a single mode. In real life, the interaction between people is often not limited to a single text, vision or hearing. The multi-modal natural interaction method can not only realize a more friendly interface between machines and humans, but also is the only way to achieve strong artificial intelligence. [0003] Understanding the real world by analyzing vision and language is the primary task of artificial intelligence to achieve human-lik...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/35G06K9/62G06N3/04G06N3/08
CPCG06F40/35G06N3/049G06N3/08G06N3/047G06N3/045G06F18/253Y02D10/00
Inventor 孙晓王佳敏汪萌
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products