Multi-modal dialogue system and method guided by user attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A dialogue system and multi-modal technology, applied in character and pattern recognition, biological neural network models, special data processing applications, etc., can solve the problems of limited attention and neglect at the attribute level

Active Publication Date: 2019-09-06

SHANDONG UNIV

View PDF4 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] (1) Most existing dialogue systems only focus on text, ignoring the fact that people tend to use multimodal information communication;

[0005] (2) In order to get the desired product, the user may pay special attention to some aspects or attributes of the product when interacting with the chat robot, while the existing dialogue system is very limited in the user's attention to the attribute level;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055]This embodiment provides a multi-modal dialogue system guided by user attention, please refer to the attached figure 1 , the dialog system includes a data acquisition module 101 , a text feature extraction module 102 , a multimodal encoder 103 and a multimodal decoder 104 .

[0056] Specifically, the data acquisition module 101 is configured to acquire the text information of the interaction between the user and the chat robot, and the visual image information of the product desired by the user.

[0057] The text feature extraction module 102 is used to train text information using a two-way cyclic neural network based on the attention mechanism, and generate attention-weighted text features.

[0058] The multimodal encoder 103 is used to extract the visual features of the visual image by using the convolutional neural network model, and input the visual features into the classification-attribute combination tree for traversal to obtain more representative attribute-leve...

Embodiment 2

[0074] This embodiment provides a multi-modal dialogue method for user attention guidance. At a high level, the dialogue method uses an attention mechanism-based bidirectional recurrent neural network (RNN) to generate attention-weighted text features. ; at a low level, a multimodal encoder and decoder are employed to encode multimodal utterance vectors and generate multimodal text responses, respectively.

[0075] Please refer to the attached figure 2 , the user attention-guided multimodal dialogue method includes the following steps:

[0076] S201. Acquire text information and visual image information of commodities.

[0077] Specifically, obtain the text information of the interaction between the user and the chat robot, as well as the visual image information of the desired product, such as image 3 shown.

[0078] S202. Using an attention mechanism-based bidirectional recurrent neural network to train text information to generate attention-weighted text features.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-modal dialogue system and method guided by user attention. The multi-modal dialogue system and method can respectively encode multi-modal utterances and generate multi-modal replies by adopting a multi-modal encoder and a multi-modal decoder. The system comprises a data acquisition module, a text feature extraction module, a multi-mode encoder and a multi-mode decoder; the data acquisition module acquires text information and visual image information of the commodity; the text feature extraction module generates attention-weighted text features; the multi-mode encoder extracts visual features of the visual image by adopting a convolutional neural network model, and inputs the visual features into the classification-attribute combination tree for traversal toobtain attribute-level visual features; multi-modal decomposition bilinear pooling processing is performed on the visual features and the text features to generate multi-modal utterance vectors; themulti-modal decoder generates a context vector; and based on the context vector, a visual image and a text attribute of a certain amount of required commodity are selected, the visual image and the text attribute are decoded, and a multi-modal commodity representation is generated.

Description

technical field [0001] The present disclosure relates to the field of language processing, and in particular to a multimodal dialogue system and method guided by user attention. Background technique [0002] Dialogue systems have received increasing attention as an intelligent way to interact with computers. However, most current approaches only focus on text-based dialogue systems, completely ignoring the rich semantics conveyed by vision. In fact, with the rapid development of many fields such as online retail and tourism, the demand for multimodal task-oriented dialogue systems is also growing. Furthermore, few methods explicitly consider the hierarchical structure of item taxonomy and users' attention to items. In fact, as the conversation progresses, users tend to focus on the semantic attributes of items, such as color and style, etc. [0003] During the research and development process, the inventor found that the existing task-oriented dialogue system has the foll...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/332G06F16/583G06K9/62G06N3/04

CPCG06F16/3329G06F16/5846G06N3/045G06F18/22

Inventor 王文杰聂礼强崔晨尹建华程志勇胡琳梅

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-modal dialogue system and method guided by user attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology