Superficial layer analyzing and auto document summary method based on abstraction degree of concept

An abstract and conceptual technology, applied in the fields of information science and information retrieval, which can solve the problems of generating summary information with a large amount of redundancy, lack of adjustment ability, and inability to include relations in conceptual induction and processing.

Active Publication Date: 2009-03-11
NANTONG ZHONGBANG TEXTILE +1
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing automated summaries of shallow analyzes do not have such moderation capabilities either.
[0004] From the above background introduction, it can be seen that the existing shallow analysis automatic document summarizati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Superficial layer analyzing and auto document summary method based on abstraction degree of concept
  • Superficial layer analyzing and auto document summary method based on abstraction degree of concept
  • Superficial layer analyzing and auto document summary method based on abstraction degree of concept

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described now in conjunction with accompanying drawing:

[0024] The present invention adopts Google search engine to input query Q, "fruit", and the first 50 Web documents of the returned result form an anthology D as an implementation example; WordNet2.1 is adopted as the ontology; the hardware environment for implementation is: P43.0Ghz CPU, Memory 512M, hard disk 80G; Windows XP Professional operating system, NTFS file system; the main program is realized by VC++6.0.

[0025] 1. Obtain data and set abstraction value. Read in the web documents returned by the search engine, remove webpage tags, remove non-text noise information, remove stop words, and perform root restoration processing. The text document sentence is used as the basic unit to form an anthology R as the object of automatic review processing, and it is set by the user Automatic summary abstraction value θ = 0.5.

[0026] 2. Represent text document sentences as co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for superficially analyzing automatic document summarization on the consideration of conceptual abstract degree, pertains to the field of information retrieval and information science, and is characterized in that: first, a document is pre-treated and an abstract value is set; second, sentences in the document are represented by a conceptual vector model via polyseme disambiguation processing; third, the sentences are clustered into a plurality of clusters with similar subjects; forth, compression ratio of a summarization is determined according to the number of key words extracted from a collected works; fifth, the abstract degree of the sentences is determined; sixth, a plurality of abstract sentences required by the compression ratio is picked up from the clusters in sequence according to IMMRA value; and seventh, sequencing is carried out to the abstract sentences extracted and a summarization document is output. The beneficial results include: the automatic document summarization on consideration of abstract degree is realized and information redundancy and lack of the automatic summarization abstract due to conception inclusion relation are reduced; and the method can adjust the length of the abstract in a self-adaptation way according to the number of the subjects and adjust the induction degree of the summarization according to the needs of abstract degree of a user, thereby being characterized by good adaptability.

Description

technical field [0001] The invention relates to an automatic document summarization method for shallow analysis in consideration of conceptual abstraction, and belongs to the fields of information retrieval and information science. Background technique [0002] Automatic document summary is a technology that uses computers to compile abstracts for multiple documents. It removes redundant information from multiple documents on the same topic, and organically integrates the main content into a short summary document according to a certain compression ratio. Quickly and accurately understand the content of the anthology to facilitate. With the development and popularization of the Internet, automatic document summarization is used as post-processing of search engines. It can generate a large number of retrieval results returned by search engines as summaries and submit them to users, which can significantly improve the efficiency of user information acquisition. Automatic docu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郭雷王晓东方俊
Owner NANTONG ZHONGBANG TEXTILE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products