Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for abstracting batch single document for document set

A technology of document abstraction and document collection, which is applied in instrumentation, computing, electrical digital data processing, etc., and can solve the problem of not using other related documents.

Active Publication Date: 2009-07-08
PEKING UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The above single-document automatic summarization methods only use the information of a single document itself, and do not use the information of other related documents, and all calculation steps need to be performed for each document to obtain the summary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for abstracting batch single document for document set
  • Method and system for abstracting batch single document for document set
  • Method and system for abstracting batch single document for document set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The method described in the present invention will be further explained below in conjunction with the embodiments and the drawings.

[0060] The main idea of ​​the present invention is as follows: using the feature of information redundancy between similar documents to better measure the importance of sentences in the document to be summarized, thereby generating a better single-document summary for the document. Document clustering for a given document collection can get several document clusters that reflect the same topic, and each cluster has similar documents. This method can perform batch single-document summarization on all documents in a single cluster, that is, the information richness of all sentences in the cluster documents can be obtained by one calculation, without the need for separate sentences for each document. Calculation. On the one hand, this method can extract really important sentences to form a high-quality summary, and on the other hand, it saves th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and a system for performing a single document summary to a document assemblage in lot quantity, and belongs to the technical field of the language word processing. Almost all automatic summary methods of the current single documents only make use of the information of an unbound document to abstract. The method of the invention can create the single document summary in lot quantity for all documents in the specified document assemblage. Firstly, the method performs the document clustering to the specified document assemblage to create a plurality of document type clusters, and the documents belonging to the same type cluster have the similar theme. Each document type cluster is specified, all sentences in the type cluster perform overall importance estimation, and then diversity castigation in the document is performed for the sentences based on each document in the type cluster, finally, a real important and novel sentence is chosen from the document to create the summary for the document. Due to the adoption of the method of the invention, the prior single document automatic summary method based on the picture array is improved, thereby obtaining better effect in the actual evaluating, and improving the summary efficiency in the mass production way.

Description

Technical field [0001] The invention belongs to the technical field of language and word processing and information retrieval, and specifically relates to a method and system for performing batch single-document summarization on a document set. Background technique [0002] Single-document automatic summarization refers to automatically extracting the essentials or points from a given document. Its purpose is to provide users with concise content descriptions by compressing and refining the original text. Single document automatic summarization is one of the core issues in the field of natural language processing, and is widely used in document / Web search engines, enterprise content management systems, and knowledge management systems (such as Founder Bosi and Founder Zhisi). [0003] In summary, the multi-document summarization method can be divided into a method based on sentence generation (Extraction) and a method based on sentence extraction (Abstraction). The method based o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 万小军杨建武吴於茜陈晓鸥
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products