Automatic multi-document abstract extraction method and automatic multi-document abstract extraction system based on sentence vectors

A sentence vector and automatic extraction technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as redundant sentences and poor connection of sentences

Active Publication Date: 2018-05-29
SHANDONG INST OF BUSINESS & TECH
View PDF6 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The commonly used extractive summarization method mainly has the problems of redundant sentences and poor connection of sentences.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic multi-document abstract extraction method and automatic multi-document abstract extraction system based on sentence vectors
  • Automatic multi-document abstract extraction method and automatic multi-document abstract extraction system based on sentence vectors
  • Automatic multi-document abstract extraction method and automatic multi-document abstract extraction system based on sentence vectors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0065] figure 1 It is a flowchart of the present invention, comprising the following steps:

[0066] S1: preprocessing document set;

[0067] S2: Use doc2vec model training to generate sentence vectors;

[0068] S3: cluster the sentence vectors and save the corresponding sentences as each subtopic document;

[0069] S4: Establish a sentence relationship graph model in each subtopic document;

[0070] S5: Calculate sentence weights in each subtopic document;

[0071] S6: Extract sentences and sort them to form a summary.

[0072] Specifically, the specific implementation steps of step S1 are as follows figure 2 As shown, it incl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic multi-document abstract extraction method and an automatic multi-document abstract extraction system based on sentence vectors. The automatic multi-document abstract extraction method includes S1, preprocessing document collections; S2, generating the sentence vectors through doc2vec model training; S3, cluttering the sentence vectors into sub-theme documents;S4, creating a sentence relation graph model in each sub-theme document; S5, calculating sentence weights; S6, extracting and sequencing sentences to form abstracts. The automatic multi-document abstract extraction method and the automatic multi-document abstract extraction system have the advantages that all the sentences in the target document collections are expressed by the vectors through thelarge-corpus-set training doc2vec model; sub themes are acquired through spectral clustering, one sentence is extracted from each sub theme, and accordingly, sentence redundancy is avoided; the sentences are sequenced according to positions in original documents to form the abstracts, and coherence of the abstract sentences is improved.

Description

technical field [0001] The invention relates to the field of computer text mining, in particular to a method and system for automatically extracting multi-document summaries based on sentence vectors. Background technique [0002] Document automatic summarization technology summarizes and refines texts for users through computers, and provides general information of texts. Users only need to briefly read the abstract to get a preliminary glimpse of the key content of the full text, which greatly improves the efficiency of users in obtaining or understanding information. Single-document automatic summarization is a computer that automatically generates a summary of the main content of a document through an algorithm. Since Luhn proposed the method of automatically generating document summaries in 1958, research based on single-document automatic summarization has been in full swing, making the results of single-document automatic summarization It is generally accepted so far...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/258
Inventor 窦全胜朱翔
Owner SHANDONG INST OF BUSINESS & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products