Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and apparatus for generating summaries for subject document collections

A document collection and abstract technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of reducing the quality and readability of the abstract, and the abstract sentences are not smooth, so as to achieve good readability and improve quality. Effect

Active Publication Date: 2018-03-16
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The aforementioned method of generating summaries has at least the following problems: when the summaries of the subject document set are generated through the above-mentioned word frequency method, since different sentences have different importance, two adjacent sentences in the generated summaries are not adjacent in the subject document set Therefore, it is easy to cause the generated summary statement to be unsmooth, reducing the quality and readability of the summary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for generating summaries for subject document collections
  • Method and apparatus for generating summaries for subject document collections
  • Method and apparatus for generating summaries for subject document collections

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 It is a flowchart of a method for generating an abstract for a subject document set provided by an embodiment of the present invention. by including as Figure 5 The computer system of the shown device executes the method.

[0035] Such as figure 1 As shown, in step 101 (candidate abstract selection step), one and / or more sentences are selected from each article in the subject document set as candidate abstracts.

[0036]Wherein, the subject document set may be a collection of multiple articles for a certain event, and the articles in the collection may come from various channels, such as Weibo, news, post bars, and forums. The sentence may be a character (such as text, etc.) between two adjacent punctuation marks (such as two adjacent commas or adjacent comma and period, etc.) in the article.

[0037] Usually, articles of various events can be obtained through various channels, and these articles can be preprocessed in some ways, that is, these articles ar...

Embodiment 2

[0045] figure 2 It is a flow chart of another embodiment of the method for generating a summary for a subject document set provided by the present invention, the embodiment can be regarded as figure 1 Another specific implementation scheme of .

[0046] Such as figure 2 As shown, in step 201, for any article in the subject document set, at least a group of consecutively arranged sentences are sequentially obtained from the content text of the article in a window sliding manner as the first long abstract candidates.

[0047] Wherein, there may be multiple manners of window sliding. For example, the number of characters that can be accommodated in the window can be set in advance, and then each time the window can slide the above number of characters, the characters behind the above number of characters can be obtained, or the preset number of characters can be slid each time the window After that, the above-mentioned number of characters after the preset number of characte...

Embodiment 3

[0074] image 3 A flow chart of another embodiment of the method for generating a summary for a subject document set provided by the present invention, which can be regarded as figure 1 Another specific implementation scheme of .

[0075] Such as image 3 As shown, in step 301, the titles of each article in the subject document set are respectively extracted as the first candidate short abstract.

[0076] Since the title of an article can usually best reflect the gist of its content, the title of each article in the subject document set is used as a short abstract candidate.

[0077] The word count of the short abstract can be preset, such as 20 words. Specifically, in step 301, any article in the subject document collection is obtained, and the title of the article is extracted therefrom as the first candidate short abstract. Using the same method, the above processing is performed on each of the remaining articles in the subject document set to obtain a plurality of firs...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a device for generating an abstract for a subject document set. The method includes: selecting one and / or multiple sentences from each article in the subject document set as candidate abstracts; respectively segmenting the candidate abstracts according to a preset syntactic analysis algorithm, and performing word segmentation on the candidate abstracts based on the word segmentation results Scoring; the candidate abstract with the highest scored value is used as the abstract for the subject document set. By adopting the embodiment of the present invention, the quality of the abstract of the subject document set can be improved, and the generated abstract can be guaranteed to have better readability.

Description

technical field [0001] The invention relates to the field of computer data processing, in particular to a method and device for generating abstracts for subject document collections. Background technique [0002] With the continuous development of information technology and Internet technology, the amount of information is getting bigger and bigger, and the sources of information are getting wider and wider. How to quickly obtain a summary of a subject document set from a large number of subject document sets with multiple information sources has become one of the important issues that people care about. [0003] Usually word frequency is used to generate summaries of subject document sets. First, segment the content information of each document in a certain subject document set of the document to obtain multiple words, filter the obtained multiple words, remove stop words, and obtain multiple word segmentations; then, through each The word frequency of a participle determ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 李炫沈剑平莫洋宋元峰郑楚煜车丽美齐沁芳
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD