Method and device for generating document summaries

A document summary and document technology, applied in word processing, instrumentation, computing, etc., can solve the problem of high redundancy of document summary

Active Publication Date: 2021-05-04
SHENZHEN RAISOUND TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the traditional multi-document summarization process, all the sentences in the document set are often calculated according to the preset importance index features, and only the internal information of the sentence is considered, which eventually leads to excessive redundancy in the generated document summarization. high problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating document summaries
  • Method and device for generating document summaries
  • Method and device for generating document summaries

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Such as figure 1 As shown, in one embodiment, a method for generating a document summary includes the following steps:

[0061] S110, performing sentence segmentation on the document set to obtain a sentence set corresponding to the document set, and representing each sentence in the sentence set with a vector space model.

[0062] Specifically, it traverses the entire document set belonging to the same topic, performs sentence segmentation processing on it, and obtains a sentence set, and then performs word segmentation processing on the English document set or Chinese document set, and uses spaces, symbols, and paragraphs for the English document set. Carry out word segmentation, for the Chinese document collection, according to the word segmentation method based on string matching, the word segmentation method based on understanding and the word segmentation method based on word frequency statistics, but not limited to this; for each word in each sentence, judge whet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a method for generating document summaries, which includes performing sentence segmentation on a document set to obtain a sentence set and expressing it with a vector space model, determining the similar sentences and the number of similar sentences corresponding to each sentence according to a preset similarity threshold, and calculating Get the corresponding importance score, obtain each sentence in the sentence set as the current processing sentence in turn, compare the number of similar sentences in the current processing sentence with the corresponding similar sentence numbers of all similar sentences in the current processing sentence, find the maximum value and The corresponding sentences are added to the diversity reference set, and then the diversity score and comprehensive score of each sentence are calculated, and finally all the sentences in the sentence set are sorted and screened to form a document summary. In addition, a device for generating document summaries is provided. The above method and device for generating document summaries comprehensively consider the internal information of the sentence and the global information in the document collection, and reduce the redundancy of the document summaries as a whole.

Description

technical field [0001] The invention relates to the field of language and word processing, in particular to a method and device for generating document abstracts. Background technique [0002] With the rapid development of Internet technology, the data in the computer network shows an explosive growth trend, and the serious problem of information overload cannot be ignored. When browsing web pages belonging to the same topic, some web pages have a lot of the same information but contain relatively little different information. At this time, a tool for summarizing information is needed to quickly browse information. Therefore, it is necessary to form the content of these pages into a document summary to improve the efficiency of information acquisition. [0003] In network data, text data occupies a very important part. Text summarization is a technology that uses computers to automatically implement text analysis, content induction, and automatic abstract generation. Text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/10G06F40/253
CPCG06F40/10G06F40/211
Inventor 张剑黄石磊
Owner SHENZHEN RAISOUND TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products