Algorithm-based text summary automatic extraction method and system

An automatic text extraction technology, applied in computing, natural language data processing, special data processing applications, etc., can solve problems such as the inability to effectively apply text summary automatic extraction methods or systems, and achieve fast calculation speed, improved accuracy, and accurate rate-enhancing effect

Inactive Publication Date: 2017-09-05
GUANGDONG PHARMA UNIV
View PDF7 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This patent application can reduce the data dimension, but currently there is no method or system that can be effectively applied to the automatic extraction of text summaries

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Algorithm-based text summary automatic extraction method and system
  • Algorithm-based text summary automatic extraction method and system
  • Algorithm-based text summary automatic extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1 As shown, an algorithm-based automatic text summary extraction method includes the following steps:

[0044] S1. Preprocess the text, such as image 3 As shown, the content of preprocessing includes: numbering each sentence in the text, segmenting the text into segments and sentences according to punctuation marks, and segmenting the text according to encoding and word segmentation tools; extracting chapter structure information of the text; Filter text for punctuation, complete abbreviations, stem words, and remove whitespace;

[0045] S2. Perform feature extraction on the preprocessed text. The content of the extracted features is specifically: learn the word vector and paragraph vector of each word in each sentence through the Doc2Vec algorithm and its corresponding neural network model, so that each sentence corresponds to a A continuous and dense real-number paragraph word vector with a specified dimension, and the real-number paragraph word vect...

Embodiment 2

[0055] Such as figure 2 and image 3 As shown, an algorithm-based text summary automatic extraction system includes a preprocessing module, a Doc2Vec-based feature extraction module, a similarity calculation module, a TextRank-based weight value calculation module, and an abstract extraction module; the preprocessing module is used for text Segmentation, sentence segmentation, and word segmentation are used to extract the text structure information of the text, and it is also used to filter the text with punctuation marks, complete abbreviations, stem the word, and delete spaces; among them, the text in the preprocessing module is segmented, The method of sentence segmentation and word segmentation is as follows: number each sentence in the text, segment and sentence the text according to punctuation marks, and segment the text according to encoding and word segmentation tools; the feature extraction module based on Doc2Vec is used to pass Doc2Vec The algorithm and its corre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an algorithm-based text summary automatic extraction method and relates to the technical field of text extraction. The method comprises the steps of S1, preprocessing text; S2, extracting the features of the text; S3, using an existing similarity calculation method to calculate the similarity of sentences, and performing weighting during the calculation; S4, using the sentences in the text as the nodes, the similarity relations of the sentences as the sides and similarity as the weight of the sides to construct an undirected weighted TextRank network graph; performing iterative computation until convergence is achieved to obtain the nodes containing the weight; S5, selecting core sentences according to the weight of the sentences corresponding to the nodes, the structure of the text and the position information of the sentences, sorting the core sentences, and outputting the sorted sentences as the extraction results. The invention also discloses a summary extraction system. By the algorithm-based text summary automatic extraction method and system, the accuracy of text summary automatic extraction can be increased favorably.

Description

technical field [0001] The invention relates to the technical field of text extraction, in particular to an algorithm-based automatic text abstract extraction method and system. Background technique [0002] Automatic extraction of text summaries based on machine learning is a hot topic in the field of text mining research in recent years, and has a very broad application prospect in search engines, portal websites, mobile Internet, information retrieval systems and other fields. Using computer technology to automatically extract text summaries can effectively mine and condense text information, reduce users' reading time, and improve user experience. [0003] Early automatic extraction of text summaries was mainly based on rules or statistical machine learning. In recent years, many researchers have begun to use various machine learning algorithms to study the automatic extraction of high text summaries, such as regression models (including linear regression or ELM regress...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/284G06F40/289G06F40/30
Inventor 余珊珊苏锦钿连俊玮
Owner GUANGDONG PHARMA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products