Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Successive principal axes filter method of multi-document automatic summarization

An automatic summary and multi-document technology, applied in the field of text information, can solve problems such as low precision, high cost, and limited application, achieving high precision and good results

Inactive Publication Date: 2007-08-01
FUDAN UNIV
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of this method is that the accuracy is relatively high, but the disadvantage is that it is not widely used. Generally, it only summarizes documents in a specific field, and the cost of manually giving a summary is also very expensive.
[0004] 2. Unsupervised summarization algorithm
The advantage of the unsupervised summarization algorithm is that it is fast, does not require a manually labeled training set, and the application is not limited by the field; the disadvantage is that the accuracy is not very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Successive principal axes filter method of multi-document automatic summarization
  • Successive principal axes filter method of multi-document automatic summarization
  • Successive principal axes filter method of multi-document automatic summarization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The basic process is to express each sentence as a space vector, calculate the similarity between two sentences, obtain a similarity matrix, obtain the main feature vector to obtain the importance of each sentence, extract the most important sentence, and then Redundant information between the remaining sentences and the extracted sentences is removed.

[0016] 1. The vector space representation of sentences.

[0017] Suppose there are now n sentences, and a total of m words appear. Then each sentence is represented by an m-dimensional vector, and n sentences form an m×n matrix, denoted as M. m ij Represents the tfidf value of the i-th word in the j-th sentence: M ij = tf ij × log n df i , where tf ij Indicates the frequency of the i-th word appearing in the j-th sentence, df i Indica...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

This invention relates to one multi-file automatic abstract order main axis filter method in text information technique, which is based on OR rotation axis method comprises the steps of sentences similarity computation and analyzing main axis and abstracting sentence redundant part.

Description

technical field [0001] The invention belongs to the technical field of text information, and in particular relates to a multi-document automatic summarization method. Background technique [0002] With the rapid development of communication, people are increasingly enjoying the convenience brought by information, especially text information, including emails, web pages, short messages and so on. A problem that comes with it is that the large amount of information often makes it difficult to grasp the key points. How to use computers to help people analyze these information and pick out important information has become a very important issue. Automatic summarization is produced to meet such needs. It is divided into single-document summarization and multi-document summarization. Due to too much information in reality, multi-document summarization is more widely used. It generates an abstract for multiple documents and submits it to users. The current summarization technolo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/28
Inventor 黄萱菁赵林吴中勤刘菲
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products