Automatic document summarization extraction method based on term vectors

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A document summary and automatic extraction technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as reducing the accuracy of node weights, affecting the performance of summarization, and ignoring the semantic similarity between sentences

Active Publication Date: 2015-08-12

DALIAN UNIV OF TECH

View PDF3 Cites 77 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The value in the sentence similarity matrix represents the jump probability from a sentence to other sentences, so the calculation of node weights is very important, but when the traditional graph method calculates the similarity between sentences, it mostly uses the feature words contained in the sentence The co-occurrence is obtained, ignoring the semantic similarity between sentences, reducing the accuracy of node weight calculation, and affecting the performance of summarization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0085] In order to make the purpose, technical solutions and beneficial effects of the present invention clearer and easier to implement, the present invention will be further described in detail in combination with the following specific embodiments and with reference to the accompanying drawings. In this embodiment, the length of the generated summary is preset to be 150 words.

[0086] S1. Use the deep neural network model to train the corpus to obtain the word vector representation of the feature words:

[0087] In order to obtain the vector representation of feature words, the embodiment adopts the biomedical literature database MEDLINE maintained by the National Library of Medicine of the United States to collect the corpus used for the experiment. Preprocess the sentences in the citation, that is, remove stop words, special characters, and punctuation marks against the stop word list, and finally obtain a 1.2G training corpus.

[0088] In the training process of this e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Provided is an automatic document summarization extraction method based on term vectors. The method includes the steps that S1, a deep neural network model is used for training linguistic data to obtain term vector representation of feature terms; S2, a sentence graph model is constructed; S3, the weights of sentences are calculated; S4, a maximum marginal relevance algorithm is used for generating a summarization. According to the method, a linguistic data set is collected and preprocessed to obtain a training feature linguistic data set, the deep neural network model is used for training the constructed training feature linguistic data set to obtain the term vectors of the feature terms, a candidate document set and a candidate sentence set are obtained from the linguistic data set through preset search terms, the semantic similarity between the senesces is obtained according to the term vectors of the feature terms, and then the semantic relation between every two sentences is obtained. The problem that in a traditional calculation method based on term co-occurrence, calculation errors are caused under the condition that semantic meaning is identical but terms are different is avoided, and therefore the accuracy of similarity calculation and the performance of the summarization are improved.

Description

technical field [0001] The invention relates to the fields of computer information retrieval and text mining, in particular to a method for automatically extracting document summaries based on word vectors. Background technique [0002] Text summarization technology is an important part of the text mining research field. This technology can find out the most important information in a document or document set and express it in a concise and coherent short text. With the advancement of science and technology and the development of network technology, there is a large amount of available information on the Internet. Faced with a large amount of data, this research can assist users to quickly understand the required information, save users' reading time, and improve work efficiency. [0003] The current text summarization technology is mainly extractive summarization, that is, extracting the most important sentences from the original text to form a summarization. The generation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/345

Inventor林鸿飞郝辉辉

OwnerDALIAN UNIV OF TECH

Automatic document summarization extraction method based on term vectors

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology