Unlock instant, AI-driven research and patent intelligence for your innovation.
A multi-modal dialogue system and method guided by user attention
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A dialogue system and multi-modal technology, applied in character and pattern recognition, biological neural network models, special data processing applications, etc., can solve problems such as ignorance and limited attention to attribute levels
Active Publication Date: 2021-08-24
SHANDONG UNIV
View PDF4 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0004] (1) Most existing dialogue systems only focus on text, ignoring the fact that people tend to use multimodal information communication;
[0005] (2) In order to get the desired product, the user may pay special attention to some aspects or attributes of the product when interacting with the chat robot, while the existing dialogue system is very limited in the user's attention to the attribute level;
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0055]This embodiment provides a multi-modal dialogue system guided by user attention, please refer to the attached figure 1 , the dialog system includes a data acquisition module 101 , a text feature extraction module 102 , a multimodal encoder 103 and a multimodal decoder 104 .
[0056] Specifically, the data acquisition module 101 is configured to acquire the text information of the interaction between the user and the chat robot, and the visual image information of the product desired by the user.
[0057] The text feature extraction module 102 is used to train text information using a two-way cyclic neural network based on the attention mechanism, and generate attention-weighted text features.
[0058] The multimodal encoder 103 is used to extract the visual features of the visual image by using the convolutional neural network model, and input the visual features into the classification-attribute combination tree for traversal to obtain more representative attribute-leve...
Embodiment 2
[0074] This embodiment provides a multi-modal dialogue method for user attention guidance. At a high level, the dialogue method uses an attention mechanism-based bidirectional recurrent neural network (RNN) to generate attention-weighted text features. ; at a low level, a multimodal encoder and decoder are employed to encode multimodal utterance vectors and generate multimodal text responses, respectively.
[0075] Please refer to the attached figure 2 , the user attention-guided multimodal dialogue method includes the following steps:
[0076] S201. Acquire text information and visual image information of commodities.
[0077] Specifically, obtain the text information of the interaction between the user and the chat robot, as well as the visual image information of the desired product, such as image 3 shown.
[0078] S202. Using an attention mechanism-based bidirectional recurrent neural network to train text information to generate attention-weighted text features.
[...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention discloses a user attention-guided multi-modal dialogue system and method. A multi-modal encoder and a decoder can be used to encode multi-modal utterances and generate multi-modal replies, respectively. The system includes a data acquisition module, a text feature extraction module, a multimodal encoder and a multimodal decoder; the data acquisition module acquires text information and visual image information of commodities; the text feature extraction module generates attention-weighted text features ; The multi-modal encoder uses the convolutional neural network model to extract the visual features of the visual image, and inputs the visual features into the classification-attribute combination tree for traversal to obtain attribute-level visual features; multi-modal decomposition of visual features and text features Linear pooling processing to generate multi-modal discourse vectors; multi-modal decoders generate context vectors; based on the context vectors, select a certain amount of desired commodity visual images and their text attributes, and decode them to generate multi-modal state of the commodity representation.
Description
technical field [0001] The present disclosure relates to the field of language processing, and in particular to a multimodal dialogue system and method guided by user attention. Background technique [0002] Dialogue systems have received increasing attention as an intelligent way to interact with computers. However, most current approaches only focus on text-based dialogue systems, completely ignoring the rich semantics conveyed by vision. In fact, with the rapid development of many fields such as online retail and tourism, the demand for multimodal task-oriented dialogue systems is also growing. Furthermore, few methods explicitly consider the hierarchical structure of item taxonomy and users' attention to items. In fact, as the conversation progresses, users tend to focus on the semantic attributes of items, such as color and style, etc. [0003] During the research and development process, the inventor found that the existing task-oriented dialogue system has the foll...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.