Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for generating document summary

A document summarization and document collection technology, applied in word processing, special data processing applications, instruments, etc., can solve the problem of high redundancy of document summaries

Active Publication Date: 2018-05-08
SHENZHEN RAISOUND TECH
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the traditional multi-document summarization process, all the sentences in the document set are often calculated according to the preset importance index features, and only the internal information of the sentence is considered, which eventually leads to excessive redundancy in the generated document summarization. high problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating document summary
  • Method and device for generating document summary
  • Method and device for generating document summary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] like figure 1 As shown, in one embodiment, a method for generating a document summary includes the following steps:

[0061] S110, performing sentence segmentation on the document set to obtain a sentence set corresponding to the document set, and representing each sentence in the sentence set with a vector space model.

[0062] Specifically, it traverses the entire document set belonging to the same topic, performs sentence segmentation processing on it, and obtains a sentence set, and then performs word segmentation processing on the English document set or Chinese document set, and uses spaces, symbols, and paragraphs for the English document set. Carry out word segmentation, for the Chinese document collection, according to the word segmentation method based on string matching, the word segmentation method based on understanding and the word segmentation method based on word frequency statistics, but not limited to this; for each word in each sentence, judge whether...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for generating a document summary. The method comprises the steps of conducting sentence segmentation on a document set to obtain a sentence set and expressing the sentence set by using a vector space model, determining similar sentences corresponding to each sentence and the number of the similar sentences according to a preset similarity threshold, obtaining corresponding importance scores by calculating, obtaining each sentence in the sentence set sequentially as current processing sentences, counting and comparing the number of the similar sentences of the current processing sentences with the number of the similar sentences corresponding to each similar sentence of the current processing sentences to find out maximum values, adding the corresponding sentences into a diversity reference set, calculating diversity scores and comprehensive scores of each sentence, and sorting all of the sentences in the sentence set and filtering to form the document summary. The invention further provides the device for generating the document summary. According to the method and device for generating the document summary, internal information of the sentence and global information in the document set are comprehensively considered to reduce the redundancy of the document summary as a whole.

Description

technical field [0001] The invention relates to the field of language and word processing, in particular to a method and device for generating document abstracts. Background technique [0002] With the rapid development of Internet technology, the data in the computer network shows an explosive growth trend, and the serious problem of information overload cannot be ignored. When browsing web pages belonging to the same topic, some web pages have a lot of the same information but contain relatively little different information. At this time, a tool for summarizing information is needed to quickly browse information. Therefore, it is necessary to form the content of these pages into a document summary to improve the efficiency of information acquisition. [0003] In network data, text data occupies a very important part. Text summarization is a technology that uses computers to automatically implement text analysis, content induction, and automatic abstract generation. Text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/21G06F17/27
CPCG06F40/10G06F40/211
Inventor 张剑黄石磊
Owner SHENZHEN RAISOUND TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products