Short text automatic abstracting method and system based on double encoders

An automatic summarization and double-coding technology, applied in the field of information processing, can solve problems such as insufficient summarization precision and insufficient utilization of semantic information

Active Publication Date: 2019-10-29
CIVIL AVIATION UNIV OF CHINA
View PDF4 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Aiming at the defects of the prior art, the present invention provides a short text automatic summarization method and system based on dual encoders, aiming at the semantic information in the current generative text summarization method Insufficient utilization, insufficient summarization accuracy, etc., a text summarization model based on dual encoders is proposed, which provides richer semantic information for the Seq2Seq architecture through dual encoders, and uses multi-layer recurrent neural networks through an improved attention mechanism. The network fuses the dual-channel semantics of the encoder, and designs a decoder with empirical distribution to speed up model convergence. At the same time, using the embedding method of integrating position embedding and word embedding, the term frequency-inverse document index (TF-IDF), part of speech (Pos), The key features are integrated into the word vector, which optimizes the word embedding dimension, enhances the model's understanding of word meaning, and improves the quality of the summary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text automatic abstracting method and system based on double encoders
  • Short text automatic abstracting method and system based on double encoders
  • Short text automatic abstracting method and system based on double encoders

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0090] For verifying effect of the present invention, carry out experimental verification according to the step described above, experimental verification result is as follows Figure 4 shown.

[0091] Step 1: The news corpus data set provided by Sogou Labs, which contains a total of 679,978 news-headline data pairs from entertainment, culture, education, military, society, finance, etc. The preprocessing of the data set removes the text with a length less than 5, and replaces messy characters such as English, special characters, and emoticons; the data is divided into three levels according to the semantic similarity between the abstract and the original text to select high-quality experimental data pairs. 1 means least relevant and 3 is most relevant. The text-abstract semantic similarity is 1 in the interval (0,0.4), 2 in the interval [0.4,0.65), and 3 in the interval [0.65,1). In this paper, the semantic correlation algorithm formula is designed as follows:

[0092] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text automatic abstracting method and system based on double encoders, belongs to the technical field of information processing, and is characterized by comprising thefollowing steps: 1, preprocessing data, 2, designing a double encoder with a bidirectional recurrent neural network, and 3, arranging an attention mechanism fusing global and local semantics 4, arranging a decoder with empirical probability distribution and using a decoder designed by adopting a double-layer unidirectional neural network; 5, adding word embedding characteristics, 6, optimizing word embedding dimensions, and 7, carrying out preprocessing and testing on the news corpus data from the Sogou laboratory, substituting the news corpus data into a Seq2Seq model with double encoders andaccompanying empirical probability distribution to carry out calculation, and carrying out experimental evaluation through a text abstract quality evaluation system Rouge. According to the invention,traditional weaving is carried out; and the decoding framework is subjected to optimization research, so that the model can fully understand text semantics, and the fluency and precision of text abstracts are improved.

Description

technical field [0001] The invention belongs to the technical field of information processing, and in particular relates to a short text automatic summarization method and system based on a double encoder. Background technique [0002] The rapid development of the Internet has made the network platform an important way for people to exchange information and communicate with each other, and it also makes it easier for people to browse and publish information. The explosive growth of online information has made information overload a serious problem. In the face of massive information, how to obtain useful information from it has become an urgent problem in the field of information processing. [0003] Automatic text summarization is an important branch in the field of natural language processing. Text summarization refers to the extraction of key information from a large amount of text by computer, and automatic text summarization is the key technology of information extract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/34G06N3/04
CPCG06F16/345G06F40/30G06N3/045
Inventor 丁建立李洋王怀超
Owner CIVIL AVIATION UNIV OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products