Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Query-Oriented Multi-Document Automatic Summarization Method

An automatic summary and multi-document technology, which is applied in special data processing applications, instruments, electronic digital data processing, etc. Topic distribution, etc.

Inactive Publication Date: 2012-02-15
NORTHEASTERN UNIV LIAONING
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing methods mainly generate summaries based on the similarity between sentences and queries, without considering the distribution of subtopics in the summaries, often resulting in a large number of summary sentences from the same subtopic, although by calculating the text repetition, try to prevent duplication of content The summary sentence can alleviate this problem to a certain extent, but it still cannot guarantee the comprehensiveness of the information in the summary.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Query-Oriented Multi-Document Automatic Summarization Method
  • A Query-Oriented Multi-Document Automatic Summarization Method
  • A Query-Oriented Multi-Document Automatic Summarization Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0109] Such as figure 1 As shown, a query-oriented multi-document automatic summarization method of the present invention comprises the following steps:

[0110] Preprocessing queries and documents;

[0111] Perform topic segmentation and semantic paragraph clustering on the above preprocessed documents to obtain subtopics;

[0112] Express the query and the sentences in each of the above subtopics in the form of word frequency vectors, and calculate the correlation between the query and the subtopics;

[0113] According to the relevance between the query and the subtopics, the subtopics are screened, and the subtopics are sorted according to the importance of the subtopics, and the top T important subtopics are selected to obtain an ordered sequence of subtopics related to the query;

[0114] The representative sentences are cyclically extracted from the subtopic sequence, and the representative sentences are connected to generate a summary.

[0115] The process of preproc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a query-oriented multi-document automatic summarization method, comprising the following steps: preprocessing the query and the document; subjecting the preprocessed document to subject segmentation and semantic paragraph clustering to obtain subtopics; The sentences in each subtopic are expressed in the form of word frequency vectors, and the correlation between the query and the subtopic is calculated; according to the correlation between the query and the subtopic, the subtopics are screened, and the subtopics are sorted according to the importance of the subtopics, and the top T is selected. An important subtopic is obtained to obtain an ordered sequence of subtopics related to the query; representative sentences are cyclically extracted from the sequence of subtopics, and the representative sentences are connected to generate a summary. The method of the present invention uses the topic segmentation technology to make the abstract within a limited length range, including the more important information in the document set as much as possible, providing more targeted services, and adjusting the content of the abstract according to the user's query topic to realize the interaction with the user .

Description

technical field [0001] The invention relates to a natural language automatic summarization processing technology, in particular to a query-oriented multi-document automatic summarization method. Background technique [0002] With the rapid change and development of human society, a large amount of new information is produced every day, and the popularity of Internet technology makes the degree of information sharing higher and higher. People can easily publish information on the Internet, resulting in excessive information sources on the Internet. Many, a lot of information is repeated. For example, for the same news event, different news organizations may publish different reports, but the main content of the reports is similar or even completely repeated, and the only difference lies in the way of expression. This repetitive information wastes readers' reading time. On the other hand, different articles under the same topic will also cover some different information. Fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 朱靖波叶娜王会珍郑妍
Owner NORTHEASTERN UNIV LIAONING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products