Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multiple file summarization method facing subject or inquiry based on cluster arrangement

A topic-oriented, multi-document technology, applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve problems such as failure to comprehensively consider the richness and novelty of information

Active Publication Date: 2006-09-06
PEKING UNIV
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still some deficiencies in the above methods. These methods fail to comprehensively consider the topic-oriented or query-oriented information richness and information novelty of the sentence, so they cannot accurately return relevant news information and information-oriented information based on user-defined hobbies and other attributes. Summary of user attributes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multiple file summarization method facing subject or inquiry based on cluster arrangement
  • Multiple file summarization method facing subject or inquiry based on cluster arrangement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] Below in conjunction with accompanying drawing and embodiment the present invention will be further described:

[0052] Such as figure 1 As shown in , a method for topic or query-oriented multi-document summarization based on cluster arrangement includes the following steps:

[0053] (1) Read in the document, use the topic or query information as a sentence, divide each document and topic or query information into sentences and words, calculate the sentence similarity, and construct a sentence relationship graph;

[0054] The theme described in this embodiment includes user attributes, user questions, user inquiries and other personalized descriptions related to specific users. These descriptions are directly provided by users, and can certainly be obtained from user behavior analysis; if the theme is too long, Topics can be broken down into multiple sentences, ideally between 1 and 5 sentences. Since the topic in this embodiment is relatively short, the topic is rega...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

To overcome the defect in prior art, the related method considers fully the relation between sentences and the relation between sentence and user query to generate the abstract both with main file information and topic explanation or query answer, and applies difference penalty algorithm to ensure the novelty of abstract. This invention can meet individual request.

Description

technical field [0001] The invention belongs to the technical field of language and word processing, and in particular relates to a multi-document summarization method based on manifold-ranking orientated to a topic or query. Background technique [0002] Multi-document summarization is a core issue in the field of natural language processing, and has been widely used in applications such as text / Web retrieval in recent years. For example, search engines such as Google and Baidu all provide news services, and form multiple news topics by collecting news information on the Internet. Brief and concise summary. However, subject or query-oriented multi-document summarization can be regarded as a special multi-document summarization task. The multi-document summarization generated by this task needs to reflect a certain topic or query (or user attribute) specified by the user, that is, That is, the generated summaries can explain or answer the user's focus or information needs....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 万小军杨建武吴於茜陈晓鸥肖建国
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products