Method and system for abstracting batch single document for document set

A technology of document summarization and document collection, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., and can solve the problems of not using other related documents

Active Publication Date: 2008-05-28
PEKING UNIV
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The above single-document automatic summarization methods only use the information of a single document itself, and do not use the information of other related documents, and all calculation steps need to be performed for each document to obtain the summary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for abstracting batch single document for document set
  • Method and system for abstracting batch single document for document set
  • Method and system for abstracting batch single document for document set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The method of the present invention will be further illustrated below in conjunction with the examples and accompanying drawings.

[0061] The main idea of ​​the present invention is as follows: use the characteristics of information redundancy among similar documents to better measure the importance of sentences in the document to be summarized, so as to generate a better single-document summary for the document. By clustering documents for a given document set, several document clusters reflecting the same theme can be obtained, and each cluster has similar documents. This method can perform batch single-document summarization on all documents in a single cluster, that is, the information-richness of all sentences in the cluster documents can be obtained through one calculation, without the need for separate sentences for each document. calculate. On the one hand, this method can extract the really important sentences to form a high-quality summary, and on the other ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for performing a single document summary to a document assemblage in lot quantity, and belongs to the technical field of the language word processing. Almost all automatic summary methods of the current single documents only make use of the information of an unbound document to abstract. The method of the invention can create the single document summary in lot quantity for all documents in the specified document assemblage. Firstly, the method performs the document clustering to the specified document assemblage to create a plurality of document type clusters, and the documents belonging to the same type cluster have the similar theme. Each document type cluster is specified, all sentences in the type cluster perform overall importance estimation, and then diversity castigation in the document is performed for the sentences based on each document in the type cluster, finally, a real important and novel sentence is chosen from the document to create the summary for the document. Due to the adoption of the method of the invention, the prior single document automatic summary method based on the picture array is improved, thereby obtaining better effect in the actual evaluating, and improving the summary efficiency in the mass production way.

Description

technical field [0001] The invention belongs to the technical field of language and word processing and information retrieval, and in particular relates to a method and a system for summarizing a document set in batches. Background technique [0002] Single-document automatic summarization refers to automatically extracting the essence or key points from a given document. Its purpose is to provide users with concise content descriptions by compressing and refining the original text. Single-document automatic summarization is one of the core issues in the field of natural language processing, and it is widely used in document / Web search engines, enterprise content management systems and knowledge management systems (such as Founder Bosi and Founder Zhisi). [0003] In a nutshell, methods for multi-document summarization can be divided into methods based on sentence generation (Extraction) and methods based on sentence extraction (Abstraction). The method based on sentence ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 万小军杨建武吴於茜陈晓鸥
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products