Method for modeling dynamic multi-document abstracts

A modeling method, multi-document technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems that affect the comprehensiveness of abstracts, abstract fragments from the same subtopic, etc.

Inactive Publication Date: 2011-11-23
HARBIN INST OF TECH
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention is to solve the problem that the traditional multi-document summarization method is difficult to globally grasp the content, distribution and association of each information aspect under the cu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for modeling dynamic multi-document abstracts
  • Method for modeling dynamic multi-document abstracts
  • Method for modeling dynamic multi-document abstracts

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0037] Specific implementation mode one: combine figure 1 Describe this implementation mode, the specific steps of the dynamic multi-document summarization modeling method of this implementation mode are:

[0038] Step 1, set up a feature extraction module, calculate the feature value of the sentence that contains the subject word in the document collection; the feature value of the sentence is the historical redundancy feature value of the sentence, the significance feature value of the sentence, the time feature value of the sentence, The length feature value of the sentence and the position feature value of the sentence, and the document collection is composed of the current document collection and the historical document collection;

[0039] Step 2. Establish an information filtering module to perform information filtering on the document collection to obtain a dynamic sentence collection;

[0040] Step 3, establish a sentence weighting module to calculate the weight of s...

specific Embodiment approach 2

[0044] Specific embodiment 2: This embodiment is a further description of step 1 in a dynamic multi-document summarization modeling method described in specific embodiment 1. In step 1, a feature extraction module is established to calculate the topics contained in the document collection. The method of the feature value of the sentence of the word is:

[0045] Step 11, calculate the weight Wgt(w) of the subject term w: Wgt(w)=TF(w)*IDF(w)*ISF(w); wherein TF(w) is the term frequency of the subject term w, IDF( w) is the inverse document frequency of the keyword w, and ISF(w) is the inverse sentence frequency of the keyword w;

[0046] Steps 1 and 2, calculating the historical redundancy feature value NWgt(s) of the sentence s:

[0047] NWgt ( s ) = Σ i = 1 m ( Σ ...

specific Embodiment approach 3

[0056] Specific embodiment three: this embodiment is a further description of step 2 in a dynamic multi-document summarization modeling method described in specific embodiment 1. In step 2, an information filtering module is established to perform a set of candidate documents Information filtering, the method of obtaining the dynamic sentence set is as follows: first, according to the historical redundancy feature value of the sentence s, sort all the sentences in the sentence set of the current document set from high to low, delete the first 50 sentences sorted, and get the dynamic collection of sentences.

[0057] The information filtering module of this embodiment processes the original sentence set, and filters the historical information in the current document set. The schematic diagram of the document set after filtering is as follows Figure 8 As shown, the sentences with large historical information in the original sentence set are filtered out, so that the remaining s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for modeling dynamic multi-document abstracts, and aims to solve the problem that the contents, and the distribution and association conditions of various information sides under current subjects are difficult to globally master, so that a large number of abstract fragments come from the same subject, and comprehensiveness of abstract is seriously influenced in the traditional multi-document abstract method. The method specifically comprises the followings steps of: preprocessing a document collection; building a characteristic extracting module; building an information filtering module; building a sentence weighting module; building an abstract generation module to generate a best abstract; and outputting the best abstract using by using an output module to finish the modeling of dynamic multi-document abstract. By the method, the dynamically evolved abstract has relatively high information novelty, and evolution of history information, so that the performance of the dynamic abstract is improved. The abstract acquired by the method is more comprehensive. And the method is applied to an abstract extracting field.

Description

technical field [0001] The invention relates to a dynamic multi-document abstract modeling method. Background technique [0002] With the rapid development of the Internet, network information is increasing rapidly. Facing more than 90% of the text information on the Internet, how to effectively organize and analyze information, meet people’s needs, improve the efficiency of people’s access to information, and make information filtering and information retrieval , automatic summarization and other technologies have become research hotspots. [0003] An abstract is a short article that briefly and accurately describes the main content of the original text for the purpose of providing an outline of the original text. The abstract should reflect the content of the original text objectively and truthfully, but it should be more concise than the original text. Abstracts can enable people to quickly judge whether there is interesting content in the original text, allowing people...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵铁军郑德权刘美玲
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products