Short text automatic abstracting method and system based on double encoders

An automatic summarization and double-coding technology, applied in the field of information processing, can solve problems such as insufficient summarization precision and insufficient utilization of semantic information

An automatic summarization and double-coding technology, applied in the field of information processing, can solve problems such as insufficient summarization precision and insufficient utilization of semantic information

CN110390103AActive Publication Date: 2019-10-29CIVIL AVIATION UNIV OF CHINA

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text automatic abstracting method and system based on double encoders
  • Short text automatic abstracting method and system based on double encoders
  • Short text automatic abstracting method and system based on double encoders

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0090] For verifying effect of the present invention, carry out experimental verification according to the step described above, experimental verification result is as follows Figure 4 shown.

[0091] Step 1: The news corpus data set provided by Sogou Labs, which contains a total of 679,978 news-headline data pairs from entertainment, culture, education, military, society, finance, etc. The preprocessing of the data set removes the text with a length less than 5, and replaces messy characters such as English, special characters, and emoticons; the data is divided into three levels according to the semantic similarity between the abstract and the original text to select high-quality experimental data pairs. 1 means least relevant and 3 is most relevant. The text-abstract semantic similarity is 1 in the interval (0,0.4), 2 in the interval [0.4,0.65), and 3 in the interval [0.65,1). In this paper, the semantic correlation algorithm formula is designed as follows:

[0092] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text automatic abstracting method and system based on double encoders, belongs to the technical field of information processing, and is characterized by comprising thefollowing steps: 1, preprocessing data, 2, designing a double encoder with a bidirectional recurrent neural network, and 3, arranging an attention mechanism fusing global and local semantics 4, arranging a decoder with empirical probability distribution and using a decoder designed by adopting a double-layer unidirectional neural network; 5, adding word embedding characteristics, 6, optimizing word embedding dimensions, and 7, carrying out preprocessing and testing on the news corpus data from the Sogou laboratory, substituting the news corpus data into a Seq2Seq model with double encoders andaccompanying empirical probability distribution to carry out calculation, and carrying out experimental evaluation through a text abstract quality evaluation system Rouge. According to the invention,traditional weaving is carried out; and the decoding framework is subjected to optimization research, so that the model can fully understand text semantics, and the fluency and precision of text abstracts are improved.

Description

technical field [0001] The invention belongs to the technical field of information processing, and in particular relates to a short text automatic summarization method and system based on a double encoder. Background technique [0002] The rapid development of the Internet has made the network platform an important way for people to exchange information and communicate with each other, and it also makes it easier for people to browse and publish information. The explosive growth of online information has made information overload a serious problem. In the face of massive information, how to obtain useful information from it has become an urgent problem in the field of information processing. [0003] Automatic text summarization is an important branch in the field of natural language processing. Text summarization refers to the extraction of key information from a large amount of text by computer, and automatic text summarization is the key technology of information extract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
29 Oct 2019
Publication
CN110390103A
IPC
G06F17/27; G06F16/34; G06N3/04
CPC
G06F16/345; G06F40/30; G06N3/045
Inventors
丁建立; ζŽζ΄‹