Multi-modal dialogue system and method guided by user attention

A dialogue system and multi-modal technology, applied in character and pattern recognition, biological neural network models, special data processing applications, etc., can solve the problems of limited attention and neglect at the attribute level

Active Publication Date: 2019-09-06
SHANDONG UNIV
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) Most existing dialogue systems only focus on text, ignoring the fact that people tend to use multimodal information communication;
[0005] (2) In order to get the desired product, the user may pay special attention to some aspects or attributes of the product when interacting with the chat robot, while the existing dialogue system is very limited in the user's attention to the attribute level;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal dialogue system and method guided by user attention
  • Multi-modal dialogue system and method guided by user attention
  • Multi-modal dialogue system and method guided by user attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055]This embodiment provides a multi-modal dialogue system guided by user attention, please refer to the attached figure 1 , the dialog system includes a data acquisition module 101 , a text feature extraction module 102 , a multimodal encoder 103 and a multimodal decoder 104 .

[0056] Specifically, the data acquisition module 101 is configured to acquire the text information of the interaction between the user and the chat robot, and the visual image information of the product desired by the user.

[0057] The text feature extraction module 102 is used to train text information using a two-way cyclic neural network based on the attention mechanism, and generate attention-weighted text features.

[0058] The multimodal encoder 103 is used to extract the visual features of the visual image by using the convolutional neural network model, and input the visual features into the classification-attribute combination tree for traversal to obtain more representative attribute-leve...

Embodiment 2

[0074] This embodiment provides a multi-modal dialogue method for user attention guidance. At a high level, the dialogue method uses an attention mechanism-based bidirectional recurrent neural network (RNN) to generate attention-weighted text features. ; at a low level, a multimodal encoder and decoder are employed to encode multimodal utterance vectors and generate multimodal text responses, respectively.

[0075] Please refer to the attached figure 2 , the user attention-guided multimodal dialogue method includes the following steps:

[0076] S201. Acquire text information and visual image information of commodities.

[0077] Specifically, obtain the text information of the interaction between the user and the chat robot, as well as the visual image information of the desired product, such as image 3 shown.

[0078] S202. Using an attention mechanism-based bidirectional recurrent neural network to train text information to generate attention-weighted text features.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-modal dialogue system and method guided by user attention. The multi-modal dialogue system and method can respectively encode multi-modal utterances and generate multi-modal replies by adopting a multi-modal encoder and a multi-modal decoder. The system comprises a data acquisition module, a text feature extraction module, a multi-mode encoder and a multi-mode decoder; the data acquisition module acquires text information and visual image information of the commodity; the text feature extraction module generates attention-weighted text features; the multi-mode encoder extracts visual features of the visual image by adopting a convolutional neural network model, and inputs the visual features into the classification-attribute combination tree for traversal toobtain attribute-level visual features; multi-modal decomposition bilinear pooling processing is performed on the visual features and the text features to generate multi-modal utterance vectors; themulti-modal decoder generates a context vector; and based on the context vector, a visual image and a text attribute of a certain amount of required commodity are selected, the visual image and the text attribute are decoded, and a multi-modal commodity representation is generated.

Description

technical field [0001] The present disclosure relates to the field of language processing, and in particular to a multimodal dialogue system and method guided by user attention. Background technique [0002] Dialogue systems have received increasing attention as an intelligent way to interact with computers. However, most current approaches only focus on text-based dialogue systems, completely ignoring the rich semantics conveyed by vision. In fact, with the rapid development of many fields such as online retail and tourism, the demand for multimodal task-oriented dialogue systems is also growing. Furthermore, few methods explicitly consider the hierarchical structure of item taxonomy and users' attention to items. In fact, as the conversation progresses, users tend to focus on the semantic attributes of items, such as color and style, etc. [0003] During the research and development process, the inventor found that the existing task-oriented dialogue system has the foll...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/583G06K9/62G06N3/04
CPCG06F16/3329G06F16/5846G06N3/045G06F18/22
Inventor 王文杰聂礼强崔晨尹建华程志勇胡琳梅
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products