A method and system for automatically extracting text summaries based on algorithms
An automatic extraction and text technology, which is applied in the fields of unstructured text data retrieval, text database browsing/visualization, and calculation. Improved, high-accuracy effects
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] Such as figure 1 As shown, an algorithm-based automatic text summary extraction method includes the following steps:
[0044] S1. Preprocess the text, such as image 3 As shown, the content of preprocessing includes: numbering each sentence in the text, segmenting the text into segments and sentences according to punctuation marks, and segmenting the text according to encoding and word segmentation tools; extracting chapter structure information of the text; Filter text for punctuation, complete abbreviations, stem words, and remove whitespace;
[0045] S2. Perform feature extraction on the preprocessed text. The content of the extracted features is specifically: learn the word vector and paragraph vector of each word in each sentence through the Doc2Vec algorithm and its corresponding neural network model, so that each sentence corresponds to a A continuous and dense real-number paragraph word vector with a specified dimension, and the real-number paragraph word vect...
Embodiment 2
[0055] Such as figure 2 and image 3 As shown, an algorithm-based text summary automatic extraction system includes a preprocessing module, a Doc2Vec-based feature extraction module, a similarity calculation module, a TextRank-based weight value calculation module, and an abstract extraction module; the preprocessing module is used for text Segmentation, sentence segmentation, and word segmentation are used to extract the text structure information of the text, and it is also used to filter the text with punctuation marks, complete abbreviations, stem the word, and delete spaces; among them, the text in the preprocessing module is segmented, The method of sentence segmentation and word segmentation is as follows: number each sentence in the text, segment and sentence the text according to punctuation marks, and segment the text according to encoding and word segmentation tools; the feature extraction module based on Doc2Vec is used to pass Doc2Vec The algorithm and its corre...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com