Dialogue abstract generation system using DialoGPT as a feature annotator
A technology for generating systems and summaries, applied in the field of dialogue summary generation systems, can solve problems such as low accuracy, time-consuming and labor-intensive acquisition of dialogue summaries, and inaccurate annotations.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach 1
[0020] Specific implementation mode 1: In this implementation mode, the dialog summary generation system using DialoGPT as a feature tagger includes:
[0021] Data acquisition module, dialog pre-training module, dialog pre-processing module, prediction loss and dialog context representation module, labeling module, summary generation module;
[0022] Described data collection module is used for obtaining SAMSum data set, AMI data set;
[0023] The dialogue pre-training module is used to obtain the dialogue pre-training model DialoGPT;
[0024] The dialogue preprocessing module processes the dialogue in the data set as a context reply pair according to the data set obtained by the data acquisition module, and processes the dialogue as a dialogue sequence;
[0025] The representation module of the prediction loss and dialogue context is used to input the dialogue processed by the dialogue pre-processing module into the dialogue pre-training model DialoGPT obtained by the dialog...
specific Embodiment approach 2
[0030] Specific embodiment two: the difference between this embodiment and specific embodiment one is that the data acquisition module is used to obtain SAMSum data sets and AMI data sets; the specific process is:
[0031] Conduct experiments on two data sets SAMSum and AMI;
[0032] SAMSum is a human-generated dialogue summarization dataset that contains dialogues in various real-life scenarios;
[0033] AMI is a meeting summary data set, each meeting contains four participants, and the meeting discussion is carried out around the remote control design;
[0034] SAMSum dataset from https: / / arxiv.org / abs / 1911.12237 Obtain;
[0035] AMI dataset from https: / / groups.inf.ed.ac.uk / ami / corpus / Obtain.
[0036] SAMSum [4] (Title: A human-annotated dialogue dataset for abstract summarization, Authors: Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer, Year: 2019, cited in Proceedings of the 2 nd Workshop on New Frontiers in Summarization);
[0037] AMI [5] (Ti...
specific Embodiment approach 3
[0039] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that the dialogue in the SAMSum data set and the AMI data set is formalized as:
[0040] Each dialogue D contains |D| sentences [u 1 ,u 2 ,...,u i ,...,u |D| ];
[0041] every sentence
[0042] where i∈[1,2,3,…,|D|], EOS i represents the end symbol of the sentence, u i,1 represents the first word of the i-th sentence, and so on;
[0043] For each dialogue D there is a corresponding digest S=[s 1 ,s 2 ,...,s |s| ], s 1 Represents the first word in the abstract S, s |s| Represents the |s|th word in the abstract S;
[0044] In a conversation, each sentence u i Both correspond to a speaker p i ;
[0045] So the final dialogue D=]p 1 ,u 1,1 ,...,EOS 1 ,...,p |D| ,u |D|,1 ,...,EOS |D| ].
[0046] Other steps and parameters are the same as those in Embodiment 1 or Embodiment 2.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


