A method and system for automatically extracting text summaries based on algorithms

An automatic extraction and text technology, which is applied in the fields of unstructured text data retrieval, text database browsing/visualization, and calculation. Improved, high-accuracy effects

Inactive Publication Date: 2020-09-25
GUANGDONG PHARMA UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This patent application can reduce the data dimension, but currently there is no method or system that can be effectively applied to the automatic extraction of text summaries

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for automatically extracting text summaries based on algorithms
  • A method and system for automatically extracting text summaries based on algorithms
  • A method and system for automatically extracting text summaries based on algorithms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1 As shown, an algorithm-based automatic text summary extraction method includes the following steps:

[0044] S1. Preprocess the text, such as image 3 As shown, the content of preprocessing includes: numbering each sentence in the text, segmenting the text into segments and sentences according to punctuation marks, and segmenting the text according to encoding and word segmentation tools; extracting chapter structure information of the text; Filter text for punctuation, complete abbreviations, stem words, and remove whitespace;

[0045] S2. Perform feature extraction on the preprocessed text. The content of the extracted features is specifically: learn the word vector and paragraph vector of each word in each sentence through the Doc2Vec algorithm and its corresponding neural network model, so that each sentence corresponds to a A continuous and dense real-number paragraph word vector with a specified dimension, and the real-number paragraph word vect...

Embodiment 2

[0055] Such as figure 2 and image 3 As shown, an algorithm-based text summary automatic extraction system includes a preprocessing module, a Doc2Vec-based feature extraction module, a similarity calculation module, a TextRank-based weight value calculation module, and an abstract extraction module; the preprocessing module is used for text Segmentation, sentence segmentation, and word segmentation are used to extract the text structure information of the text, and it is also used to filter the text with punctuation marks, complete abbreviations, stem the word, and delete spaces; among them, the text in the preprocessing module is segmented, The method of sentence segmentation and word segmentation is as follows: number each sentence in the text, segment and sentence the text according to punctuation marks, and segment the text according to encoding and word segmentation tools; the feature extraction module based on Doc2Vec is used to pass Doc2Vec The algorithm and its corre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an algorithm-based method for automatically extracting text summaries, which relates to the technical field of text extraction and includes the following steps: S1, preprocessing the text; S2, extracting features from the text; S3, using existing similar The degree calculation method calculates the similarity between sentences, and performs weighting processing in the calculation process; S4, using each sentence in the text as a node, taking the similarity relationship between sentences as a side, and using the similarity as a side to construct an undirected weight Weighted TextRank network diagram; through iterative calculation to convergence, each node containing the weight value is obtained; S5, according to the weight value of the sentence corresponding to each node, the discourse structure of the text and the position information of the sentence, the core sentence is selected, and the core sentence is sorted as Extract the results for output. The invention also discloses an abstract extraction system. The invention is beneficial to improving the accuracy rate of automatic extraction of text summaries.

Description

technical field [0001] The invention relates to the technical field of text extraction, in particular to an algorithm-based automatic text abstract extraction method and system. Background technique [0002] Automatic extraction of text summaries based on machine learning is a hot topic in the field of text mining research in recent years, and has a very broad application prospect in search engines, portal websites, mobile Internet, information retrieval systems and other fields. Using computer technology to automatically extract text summaries can effectively mine and condense text information, reduce users' reading time, and improve user experience. [0003] Early automatic extraction of text summaries was mainly based on rules or statistical machine learning. In recent years, many researchers have begun to use various machine learning algorithms to study the automatic extraction of high text summaries, such as regression models (including linear regression or ELM regress...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/34G06F40/284G06F40/289G06F40/211G06F40/30
CPCG06F40/211G06F40/284G06F40/289G06F40/30
Inventor 余珊珊苏锦钿连俊玮
Owner GUANGDONG PHARMA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products