Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-document auto-abstracting method facing to inquiry

An automatic abstract and multi-document technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem that the comprehensiveness of information in the abstract cannot be well guaranteed, the distribution of abstract subtopics is not considered, and the source of abstract sentences Questions like the same subtopic

Inactive Publication Date: 2010-01-06
NORTHEASTERN UNIV
View PDF0 Cites 60 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing methods mainly generate summaries based on the similarity between sentences and queries, without considering the distribution of subtopics in the summaries, often resulting in a large number of summary sentences from the same subtopic, although by calculating the text repetition, try to prevent duplication of content The summary sentence can alleviate this problem to a certain extent, but it still cannot guarantee the comprehensiveness of the information in the summary.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document auto-abstracting method facing to inquiry
  • Multi-document auto-abstracting method facing to inquiry
  • Multi-document auto-abstracting method facing to inquiry

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0109] like figure 1 As shown, a query-oriented multi-document automatic summarization method of the present invention comprises the following steps:

[0110] Preprocessing queries and documents;

[0111] Perform topic segmentation and semantic paragraph clustering on the above preprocessed documents to obtain subtopics;

[0112] Express the query and the sentences in each of the above subtopics in the form of word frequency vectors, and calculate the correlation between the query and the subtopics;

[0113] According to the relevance between the query and the subtopics, the subtopics are screened, and the subtopics are sorted according to the importance of the subtopics, and the top T important subtopics are selected to obtain an ordered sequence of subtopics related to the query;

[0114] The representative sentences are cyclically extracted from the subtopic sequence, and the representative sentences are connected to generate a summary.

[0115] The process of preprocess...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a multi-document auto-abstracting method facing to inquiry, which comprises the following steps: performing preprocessing on the inquiry and documents; performing topic segmentation and semantic paragraph clustering on the preprocessed documents to obtain subtopics; expressing the inquiry and the sentences in each of the subtopics in the form of a word frequency vector, and calculating the correlation measurement of the inquiry and the subtopics; screening the subtopics according to the correlation measurement of the inquiry and the subtopics, sequencing the subtopics according to the importance of the subtopics, and selecting the front T important subtopics to obtain an ordered sequence of the subtopics correlative with the inquiry; and circularly obtaining representative sentences from the subtopic sequence in turn, and connecting the representative sentences together to generate an abstract. The method uses the topic segmentation technique so that the abstract is in a limited length range and comprises the important information in a document set as much as possible, provides more targeted services, can adjust the content of the abstract according to a user inquiry topic, and can achieve the interactions with users.

Description

technical field [0001] The invention relates to a natural language automatic summarization processing technology, in particular to a query-oriented multi-document automatic summarization method. Background technique [0002] With the rapid change and development of human society, a large amount of new information is produced every day, and the popularity of Internet technology makes the degree of information sharing higher and higher. People can easily publish information on the Internet, resulting in excessive information sources on the Internet. Many, a lot of information is repeated. For example, for the same news event, different news organizations may publish different reports, but the main content of the reports is similar or even completely repeated, and the only difference lies in the way of expression. This repetitive information wastes readers' reading time. On the other hand, different articles under the same topic will also cover some different information. Fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 朱靖波叶娜王会珍郑妍
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products