Autoabstract method for multi-document

An automatic summarization and multi-document technology, applied in the field of information processing, can solve problems such as difficult to guarantee accuracy, and achieve the effect of improving accuracy

Active Publication Date: 2008-07-30
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In summary, the ability to automatically extract abstracts in the existing technology is limited by various factors, and the accuracy is difficult to guarantee

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Autoabstract method for multi-document
  • Autoabstract method for multi-document
  • Autoabstract method for multi-document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The core idea of ​​the present invention is to introduce the sentence relationship graph model and the mining of the implicit logic structure of topic-subtopic into the multi-document summarization, and convert the summarization problem into an iterative process of subtopic search and subgraph division. Fig. 1 draws a flowchart of a multi-document summarization method according to a preferred embodiment of the present invention.

[0041] The specific embodiment of the present invention will be described in detail below in conjunction with accompanying drawing 1 .

[0042] As shown in step 101 of Figure 1, read in the required document set, use the sentence boundary detection method to represent the document as a set of sentences segmented out, and then perform Chinese word segmentation / removal of stop words and Western restoration / removal of stop words Word manipulation, representing sentences with vector space models.

[0043] As shown in step 102 of attached drawing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method which utilizes a graph partition method to automatically extract a multi-document summarization, and the method comprises the following steps that: the sentence boundary dividing is carried out, and the document is expressed by the divided sentences; the sentences are expressed into vectors, the similarities among each two sentences are calculated to compose a sentence incidence matrix, which is reduced according to the appointed threshold value, at the same time, the normalized treatment is carried out; the crawling of the implied logical topic of a topic is introduced into the multi-document summarization, and a document set is divided into different implied sub-topics according to the topic, thereby the summarization task is changed into the selection and the extraction processes to the sub-topics; by applying the graph partition method, the importance degree of the sub-topic of the sentences is ensured from the global characteristics, and the low redundancy of the contents among the different sub-tops is ensured from the local characteristics, thereby effectively improving the quality of the summarization.

Description

technical field [0001] The present invention relates to the field of information processing, and further relates to a multi-document automatic summarization method. Background technique [0002] With the progress of the times and the development of the economy, people's demand for information in daily life is increasing, especially with the increasing popularity of the Internet, a large amount of information is released and disseminated on the Internet every day. Taking the development of China's Internet as an example, according to the search results provided by Peking University Skynet, the total number of web pages in China was about 1.08 billion at the end of 2005. According to the statistics of CNNIC, as of the end of March 2007, the number of WAP web pages in China was about 260 million. The page size is about 800GB. The Internet with a rapidly increasing webpage scale provides people with richer information services, but also brings confusion to people on how to obta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 张瑾许洪波王小磊
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products