Chinese text abstract generation method based on sequence-to-sequence model

A sequence and model technology, applied in the field of Chinese text summary generation based on sequence-to-sequence model, can solve the problems of loss of important information, long training time, semantic incoherence, etc., to enhance coding ability, improve accuracy, and avoid vocabulary oversized effect

Active Publication Date: 2020-04-28
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The generative method based on the deep learning model will have a better performance in the final generated summary, but there will also be problems such as missing some important information and semantic incoherence. Most of the current improvement solutions start with the decoder and improve the decoding. way and adjust the attention mechanism, but the effect is still very limited, and the training time is longer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text abstract generation method based on sequence-to-sequence model
  • Chinese text abstract generation method based on sequence-to-sequence model
  • Chinese text abstract generation method based on sequence-to-sequence model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be further described below in conjunction with specific embodiment

[0050] The method for generating a Chinese text abstract based on the sequence-to-sequence model provided by this embodiment includes the following steps:

[0051] 1) After distinguishing the original text and the abstract text on the large-scale Chinese microblog data, the original text and the abstract text are segmented word by word, and the English words and numbers are not segmented, and they are respectively filled to a fixed length, the original text This setting is 150, the summary text is set to 30, and it is used as a training sample in one-to-one correspondence. Construct a word table from the data obtained above, first determine the dimension of the word vector of the word table, this method is set to 256 dimensions, then use the Gaussian distribution to randomly initialize, and set it to be trainable, do the summary text according to the word table One-hot vecto...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese text abstract generation method based on a sequence-to-sequence model, and the method comprises the steps: firstly carrying out the word segmentation of a text, filling the text to a fixed length, and carrying out the Gaussian random initialization of a word vector; encoding the text, inputting the encoded text into a bidirectional long short-term memory (LSTM) network, and taking the final output state as precoding; performing convolutional neural network (CNN) on the word vectors according to different window sizes, and outputting the word vectors as windowword vectors; constructing an encoder, constructing a bidirectional LSTM (Long Short Term Memory), taking precoding as an initialization parameter of the bidirectional LSTM, and taking a window word vector in the previous step as input; and constructing a decoder, and generating a text by using a one-way LSTM and combining an attention mechanism. According to the method, a traditional encoder froma sequence to a sequence model is improved, so that the model can obtain more original text information in an encoding stage, a better text abstract is finally decoded, and a word vector with smallerfine granularity is used, so that the method is more suitable for a Chinese text.

Description

technical field [0001] The invention relates to the technical fields of deep learning and natural language processing, in particular to a method for generating Chinese text abstracts based on a sequence-to-sequence model. Background technique [0002] The main content of the text automatic generation summary technology is as follows: for a long text, a short text is finally generated through the model, and the main content of the source text can be summarized. [0003] There are currently two mainstream text summarization methods, extractive and generative. The extractive method refers to using an algorithm to find one or several sentences closest to the main idea from the original text. It is a relatively mature solution, but since all the content of the extractive summary is extracted from the original text, the generated summary is readable. The performance and fluency are unsatisfactory, and there is still a long way to go from practical application. [0004] The gener...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F16/35G06N3/04
CPCG06F16/345G06F16/35G06N3/044G06N3/045
Inventor 尹叶龙邓辉舫
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products