Automatic text summarization method based on enhanced semantics

An automatic text and summary technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as information loss, achieve the effect of improving quality and avoiding deviation

Inactive Publication Date: 2018-11-13
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional extractive summarization usually causes a lot of information loss, especially in long texts. Therefore, in-depth research on generative automatic summarization is of great significance for truly solving information overload.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text summarization method based on enhanced semantics
  • Automatic text summarization method based on enhanced semantics
  • Automatic text summarization method based on enhanced semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0039] Such as figure 1 As shown, the automatic text summarization method based on enhanced semantics includes: a text preprocessing step, an encoding step, a decoding step, an attention step, and a summary generation step. in:

[0040] Text preprocessing step. The text data here can be a corpus crawled by a crawler or an open source corpus. Taking CNN / Daily Mail as an example, it is composed of article-abstract pairs, with an average of 780 words per article. Abstracts average 56 words. Segment the source text, restore the form, and resolve the reference. According to the word frequency, the first 200k words are obtained as the basic vocabulary, and the extended vocabulary corresponding to the words of each text is formed. At the same time, the special mark [PAD], [UNK], [START], [STOP] are added to the vocabulary, and the words of the text are converted into ids. Each article corresponds to a sequence. The abstract is the same. The training set contains 287,226 samples, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic text summarization method based on enhanced semantics. The method comprises the following steps of: preprocessing a text, arranging words from high to low according to the word frequency information, and converting the words to id; using a single-layer bi-directional LSTM to encode the input sequence and extracting text information features; using a single-layer unidirectional LSTM to decode the encoded text semantic vector to obtain the hidden layer state; calculating a context vector to extract the information, most useful the current output, from the input sequence; after decoding, obtaining the probability distribution of the size of a word list, and adopting a strategy to select summarization words; in the training phase, fusing the semantic similarity between the generated summarization and the source text to calculate the loss, so as to improve the semantic similarity between the summarization and the source text. The invention utilizes the LSTM depth learning model to characterize the text, integrates the semantic relation of the context, enhances the semantic relation between the summarization and the source text, and generates the summarization which is more suitable for the subject idea of the text, and has a wide application prospect.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an automatic text summarization method based on enhanced semantics. Background technique [0002] With the rapid development of technology and the Internet, and the advent of the era of big data, the overwhelming amount of network information is increasing day by day. Among them, the explosive growth of representative text information, such as news, blogs, chats, reports, microblogs, etc., makes the information overload, and the huge information makes people spend a lot of time browsing and reading. Therefore, how to quickly extract key content from a large amount of text information and solve the problem of information overload has become an urgent need, and automatic text summarization technology has emerged as the times require. [0003] Automatic text summarization technology can be divided into extractive summarization and generative summarization accord...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/289
Inventor 史景伦洪冬梅宁培阳王桂鸿
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products