Method and apparatus for generating summaries for subject document collections
A document collection and abstract technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of reducing the quality and readability of the abstract, and the abstract sentences are not smooth, so as to achieve good readability and improve quality. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0034] figure 1 It is a flowchart of a method for generating an abstract for a subject document set provided by an embodiment of the present invention. by including as Figure 5 The computer system of the shown device executes the method.
[0035] Such as figure 1 As shown, in step 101 (candidate abstract selection step), one and / or more sentences are selected from each article in the subject document set as candidate abstracts.
[0036]Wherein, the subject document set may be a collection of multiple articles for a certain event, and the articles in the collection may come from various channels, such as Weibo, news, post bars, and forums. The sentence may be a character (such as text, etc.) between two adjacent punctuation marks (such as two adjacent commas or adjacent comma and period, etc.) in the article.
[0037] Usually, articles of various events can be obtained through various channels, and these articles can be preprocessed in some ways, that is, these articles ar...
Embodiment 2
[0045] figure 2 It is a flow chart of another embodiment of the method for generating a summary for a subject document set provided by the present invention, the embodiment can be regarded as figure 1 Another specific implementation scheme of .
[0046] Such as figure 2 As shown, in step 201, for any article in the subject document set, at least a group of consecutively arranged sentences are sequentially obtained from the content text of the article in a window sliding manner as the first long abstract candidates.
[0047] Wherein, there may be multiple manners of window sliding. For example, the number of characters that can be accommodated in the window can be set in advance, and then each time the window can slide the above number of characters, the characters behind the above number of characters can be obtained, or the preset number of characters can be slid each time the window After that, the above-mentioned number of characters after the preset number of characte...
Embodiment 3
[0074] image 3 A flow chart of another embodiment of the method for generating a summary for a subject document set provided by the present invention, which can be regarded as figure 1 Another specific implementation scheme of .
[0075] Such as image 3 As shown, in step 301, the titles of each article in the subject document set are respectively extracted as the first candidate short abstract.
[0076] Since the title of an article can usually best reflect the gist of its content, the title of each article in the subject document set is used as a short abstract candidate.
[0077] The word count of the short abstract can be preset, such as 20 words. Specifically, in step 301, any article in the subject document collection is obtained, and the title of the article is extracted therefrom as the first candidate short abstract. Using the same method, the above processing is performed on each of the remaining articles in the subject document set to obtain a plurality of firs...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


