Unlock instant, AI-driven research and patent intelligence for your innovation.

Dialogue abstract generation system using DialoGPT as a feature annotator

A technology for generating systems and summaries, applied in the field of dialogue summary generation systems, can solve problems such as low accuracy, time-consuming and labor-intensive acquisition of dialogue summaries, and inaccurate annotations.

Active Publication Date: 2021-08-03
HARBIN INST OF TECH
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention aims to solve the problem of manually adding annotations to the dialog in the existing dialog abstract generation method, and obtaining the annotation through an open domain toolkit that is not applicable to the dialog. Low problem, and propose a dialogue summarization system using DialoGPT as a feature tagger

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dialogue abstract generation system using DialoGPT as a feature annotator
  • Dialogue abstract generation system using DialoGPT as a feature annotator
  • Dialogue abstract generation system using DialoGPT as a feature annotator

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0020] Specific implementation mode 1: In this implementation mode, the dialog summary generation system using DialoGPT as a feature tagger includes:

[0021] Data acquisition module, dialog pre-training module, dialog pre-processing module, prediction loss and dialog context representation module, labeling module, summary generation module;

[0022] Described data collection module is used for obtaining SAMSum data set, AMI data set;

[0023] The dialogue pre-training module is used to obtain the dialogue pre-training model DialoGPT;

[0024] The dialogue preprocessing module processes the dialogue in the data set as a context reply pair according to the data set obtained by the data acquisition module, and processes the dialogue as a dialogue sequence;

[0025] The representation module of the prediction loss and dialogue context is used to input the dialogue processed by the dialogue pre-processing module into the dialogue pre-training model DialoGPT obtained by the dialog...

specific Embodiment approach 2

[0030] Specific embodiment two: the difference between this embodiment and specific embodiment one is that the data acquisition module is used to obtain SAMSum data sets and AMI data sets; the specific process is:

[0031] Conduct experiments on two data sets SAMSum and AMI;

[0032] SAMSum is a human-generated dialogue summarization dataset that contains dialogues in various real-life scenarios;

[0033] AMI is a meeting summary data set, each meeting contains four participants, and the meeting discussion is carried out around the remote control design;

[0034] SAMSum dataset from https: / / arxiv.org / abs / 1911.12237 Obtain;

[0035] AMI dataset from https: / / groups.inf.ed.ac.uk / ami / corpus / Obtain.

[0036] SAMSum [4] (Title: A human-annotated dialogue dataset for abstract summarization, Authors: Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer, Year: 2019, cited in Proceedings of the 2 nd Workshop on New Frontiers in Summarization);

[0037] AMI [5] (Ti...

specific Embodiment approach 3

[0039] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that the dialogue in the SAMSum data set and the AMI data set is formalized as:

[0040] Each dialogue D contains |D| sentences [u 1 ,u 2 ,...,u i ,...,u |D| ];

[0041] every sentence

[0042] where i∈[1,2,3,…,|D|], EOS i represents the end symbol of the sentence, u i,1 represents the first word of the i-th sentence, and so on;

[0043] For each dialogue D there is a corresponding digest S=[s 1 ,s 2 ,...,s |s| ], s 1 Represents the first word in the abstract S, s |s| Represents the |s|th word in the abstract S;

[0044] In a conversation, each sentence u i Both correspond to a speaker p i ;

[0045] So the final dialogue D=]p 1 ,u 1,1 ,...,EOS 1 ,...,p |D| ,u |D|,1 ,...,EOS |D| ].

[0046] Other steps and parameters are the same as those in Embodiment 1 or Embodiment 2.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a dialogue abstract generation system, in particular to a dialogue abstract generation system using DialoGPT as a feature annotator. The invention aims to solve the problems of high time and labor consumption, poor efficiency and low accuracy of dialogue abstract acquisition of the existing dialogue abstract generation method. The system comprises a data acquisition module which is used for acquiring a data set; a dialogue pre-training module which is used for acquiring a DialoGPT; a dialogue preprocessing module which processes a dialogue into a context reply pair and a dialogue sequence; a prediction loss and dialogue context representation module which is used for obtaining a prediction loss and dialogue context representation form; a labeling module which is used for labeling a dialogue; and an abstract generation module which generates a target abstract. When the generated target abstract meets the requirement, the to-be-processed data set is processed, and the processed to-be-processed data set is input into an abstract generator to generate a target abstract of the to-be-processed data set. The method is applied to the field of natural language processing.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a dialogue summarization generation system. Background technique [0002] Conversation summarization aims to generate a concise overview of a conversation. [1] (Title: Semantic similarity applied to spoken dialogue summarization, Authors: Iryna Gurevych and Michael Strube, Year: 2004, cited in Proceedings of the 20th International Conference on Computational Linguistics). In theory, Peyrard [2] (Title: A simple theoretical model of importance for summarization, author: Maxime Peyrard, year: 2019, cited in Proceedings of the 57 th Annual Meeting of the Association for Computational Linguistics) pointed out that the evaluation of abstracts is related to three aspects, including information volume, redundancy and relevance. A good summary should contain more information, low redundancy and high relevance. Aiming at the above three aspects, the previous work used the "...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/34
CPCG06F16/3329G06F16/345
Inventor 冯骁骋冯夏冲秦兵刘挺朱坤
Owner HARBIN INST OF TECH