A multi-modal dialogue system and method guided by user attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A dialogue system and multi-modal technology, applied in character and pattern recognition, biological neural network models, special data processing applications, etc., can solve problems such as ignorance and limited attention to attribute levels

Active Publication Date: 2021-08-24

SHANDONG UNIV

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] (1) Most existing dialogue systems only focus on text, ignoring the fact that people tend to use multimodal information communication;

[0005] (2) In order to get the desired product, the user may pay special attention to some aspects or attributes of the product when interacting with the chat robot, while the existing dialogue system is very limited in the user's attention to the attribute level;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055]This embodiment provides a multi-modal dialogue system guided by user attention, please refer to the attached figure 1 , the dialog system includes a data acquisition module 101 , a text feature extraction module 102 , a multimodal encoder 103 and a multimodal decoder 104 .

[0056] Specifically, the data acquisition module 101 is configured to acquire the text information of the interaction between the user and the chat robot, and the visual image information of the product desired by the user.

[0057] The text feature extraction module 102 is used to train text information using a two-way cyclic neural network based on the attention mechanism, and generate attention-weighted text features.

[0058] The multimodal encoder 103 is used to extract the visual features of the visual image by using the convolutional neural network model, and input the visual features into the classification-attribute combination tree for traversal to obtain more representative attribute-leve...

Embodiment 2

[0074] This embodiment provides a multi-modal dialogue method for user attention guidance. At a high level, the dialogue method uses an attention mechanism-based bidirectional recurrent neural network (RNN) to generate attention-weighted text features. ; at a low level, a multimodal encoder and decoder are employed to encode multimodal utterance vectors and generate multimodal text responses, respectively.

[0075] Please refer to the attached figure 2 , the user attention-guided multimodal dialogue method includes the following steps:

[0076] S201. Acquire text information and visual image information of commodities.

[0077] Specifically, obtain the text information of the interaction between the user and the chat robot, as well as the visual image information of the desired product, such as image 3 shown.

[0078] S202. Using an attention mechanism-based bidirectional recurrent neural network to train text information to generate attention-weighted text features.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a user attention-guided multi-modal dialogue system and method. A multi-modal encoder and a decoder can be used to encode multi-modal utterances and generate multi-modal replies, respectively. The system includes a data acquisition module, a text feature extraction module, a multimodal encoder and a multimodal decoder; the data acquisition module acquires text information and visual image information of commodities; the text feature extraction module generates attention-weighted text features ; The multi-modal encoder uses the convolutional neural network model to extract the visual features of the visual image, and inputs the visual features into the classification-attribute combination tree for traversal to obtain attribute-level visual features; multi-modal decomposition of visual features and text features Linear pooling processing to generate multi-modal discourse vectors; multi-modal decoders generate context vectors; based on the context vectors, select a certain amount of desired commodity visual images and their text attributes, and decode them to generate multi-modal state of the commodity representation.

Description

technical field [0001] The present disclosure relates to the field of language processing, and in particular to a multimodal dialogue system and method guided by user attention. Background technique [0002] Dialogue systems have received increasing attention as an intelligent way to interact with computers. However, most current approaches only focus on text-based dialogue systems, completely ignoring the rich semantics conveyed by vision. In fact, with the rapid development of many fields such as online retail and tourism, the demand for multimodal task-oriented dialogue systems is also growing. Furthermore, few methods explicitly consider the hierarchical structure of item taxonomy and users' attention to items. In fact, as the conversation progresses, users tend to focus on the semantic attributes of items, such as color and style, etc. [0003] During the research and development process, the inventor found that the existing task-oriented dialogue system has the foll...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/332G06F16/583G06K9/62G06N3/04

CPCG06F16/3329G06F16/5846G06N3/045G06F18/22

Inventor 王文杰聂礼强崔晨尹建华程志勇胡琳梅

Owner SHANDONG UNIV

A multi-modal dialogue system and method guided by user attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology